In working with our customers, I come across some very interesting patterns in T-SQL. Not all of these patterns are complex on the face of it, but their impact can sometimes be substantial. Here is a simple example to demonstrate what I am referring to:
SELECT CASE WHEN EXISTS (SELECT * FROM Person.Person AS P WHERE P.BusinessEntityID = SOD.ProductID) THEN 1 ELSE 0 END AS Col1, CASE WHEN EXISTS (SELECT * FROM Person.Person AS P WHERE P.BusinessEntityID = SOD.ProductID) THEN 1 ELSE 0 END AS Col2 FROM Sales.SalesOrderDetail AS SOD;
This query is clearly fictional, but it will suffice for our walkthrough here 🙂 Now here’s the execution plan for the above statement (you can also test this by using AdventureWorks2012 and above):
Observe that the Person.Person table is accessed twice in the above plan. In the real customer scenario that I was looking at, the table involved was accessed using a nested loop join, for a total of 13 million seeks. And these 13 million were repeated again for the second usage of the subquery.
My attempt to rewrite the query was to use a LEFT OUTER JOIN syntax, as reflected in this simple example:
select CASE WHEN P.BusinessEntityID IS NOT NULL THEN 1 ELSE 0 END as Col1, CASE WHEN P.BusinessEntityID IS NOT NULL THEN 1 ELSE 0 END as Col2 from Sales.SalesOrderDetail SOD LEFT OUTER JOIN Person.Person P ON P.BusinessEntityID = SOD.ProductID
Here is the execution plan for the optimized version:
These two queries are functionally equivalent (because in this case there is a unique key on the BusinessEntityId column) and return the same set of results. As you can see the Person table is just accessed once in the revised query. The cost of the re-written query is also ~ 6 times lesser than the original query! This kind of optimization is referred to as Constant Subexpression Elimination. There are some strategies used by SQL Server (see this paper and this one as well for some ideas) but not every possible scenario (such as the above one) can be optimized. Also, this paper from Microsoft Research covers future ideas on the topic.
For now, it is best to review your T-SQL code for multiple instances of the same subquery and replace those with an explicit JOIN instead.