T-SQL Anti-pattern of the day: UPPER() and LOWER()

This one is even more common and therefore has a high impact: the usage of the above functions in WHERE clause predicates. I’ve examined each in some detail below.

Here’s a typical example I see, based off the AdventureWorks database.

Scenario

To illustrate the effect of this example, I have added an extra index on the FirstName column in Person.Contact table:

CREATE NONCLUSTERED INDEX NC_Person_Contact_FirstName
ON Person.Contact(FirstName)

Here’s a typical query pattern coming in from the client:

DECLARE @FirstName nvarchar(50)
SELECT @FirstName = ‘pilar’

SELECT *
FROM Person.Contact
WHERE UPPER(FirstName) = UPPER(@FirstName)

Problem

Let’s take a look at the execution plan for this batch:

As expected there is a scan happening, due to the presence of the UPPER() on the FirstName column. Interestingly the optimizer chose to scan the non-clustered index rather than the clustered one. It did that because the cost of scanning the narrower index on FirstName is cheaper than the clustered index. Anyway, it is still far from optimal.

Workarounds

If your server / database / column is set to use a case insensitive collation (if you don’t know what a collation is, I suggest you start with this Books Online entry first) then you really do not need any of the UPPER() or LOWER() function calls, because in that case, the execution engine will perform a case-insensitive comparison. Most customers I work with, use the US English installation and the default collation of SQL_Latin1_General_CP1_CI_AS. For those customers, this workaround would be safe.

Is it a good one by design? The answer as with many things in SQL Server, is it depends. If your installation procedures are standardized and repeatable, and if you are using the default collation (or any other case-insensitive one) then this workaround may very well turn into a by-design solution. In other cases, this workaround may not solve the problem.

A more robust solution would be to transform (in this case, to uppercase) the values being stored in these tables (in our case, FirstName) at INSERT time. That way, the predicate in the SELECT query can change to the following:

FirstName = UPPER(@FirstName)

Conclusion

Careful thought needs to go into design and development of databases where strings are being compared. Using a case-insensitive collation is sometimes the best way to avoid some of the problems described above. However if you have a case-sensitive collation, performing these operations up front in client code at INSERT time is perhaps the best way to negate the performance side effects of using the UPPER() / LOWER() predicates in WHERE clauses.

Arvind Shyamsundar's technical blog

Arvind Shyamsundar is a Principal PM @ MSFT Azure Data, working on Azure SQL. Data geek. Apache Accumulo and Fluo PMC. SQL MCM, ex-Principal PFE (MSFT Services). These are my own opinions and not those of Microsoft.

T-SQL Anti-pattern of the day: UPPER() and LOWER()

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply