Common sub-expression elimination in SQL Server: what you need to know

In working with our customers, I come across some very interesting patterns in T-SQL. Not all of these patterns are complex on the face of it, but their impact can sometimes be substantial. Here is a simple example to demonstrate what I am referring to:

SELECT CASE WHEN EXISTS (SELECT *
                         FROM   Person.Person AS P
                         WHERE  P.BusinessEntityID = SOD.ProductID) THEN 1 ELSE 0 END AS Col1,
       CASE WHEN EXISTS (SELECT *
                         FROM   Person.Person AS P
                         WHERE  P.BusinessEntityID = SOD.ProductID) THEN 1 ELSE 0 END AS Col2
FROM   Sales.SalesOrderDetail AS SOD;

This query is clearly fictional, but it will suffice for our walkthrough here 🙂 Now here’s the execution plan for the above statement (you can also test this by using AdventureWorks2012 and above):

httpsmsdnshared.blob.core.windows.netmedia201606image922

Observe that the Person.Person table is accessed twice in the above plan. In the real customer scenario that I was looking at, the table involved was accessed using a nested loop join, for a total of 13 million seeks. And these 13 million were repeated again for the second usage of the subquery.

My attempt to rewrite the query was to use a LEFT OUTER JOIN syntax, as reflected in this simple example:

select CASE WHEN P.BusinessEntityID IS NOT NULL THEN 1 ELSE 0 END as Col1,

CASE WHEN P.BusinessEntityID IS NOT NULL THEN 1 ELSE 0 END as Col2

from Sales.SalesOrderDetail SOD

LEFT OUTER JOIN Person.Person P ON P.BusinessEntityID = SOD.ProductID

Here is the execution plan for the optimized version:

httpsmsdnshared.blob.core.windows.netmedia201606image923These two queries are functionally equivalent (because in this case there is a unique key on the BusinessEntityId column) and return the same set of results. As you can see the Person table is just accessed once in the revised query. The cost of the re-written query is also ~ 6 times lesser than the original query! This kind of optimization is referred to as Constant Subexpression Elimination. There are some strategies used by SQL Server (see this paper and this one as well for some ideas) but not every possible scenario (such as the above one) can be optimized. Also, this paper from Microsoft Research covers future ideas on the topic.

For now, it is best to review your T-SQL code for multiple instances of the same subquery and replace those with an explicit JOIN instead.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.