Just a quick shout out that I have published a blog about the ScriptDom parser on the Azure Dev blog: Programmatically parsing Transact SQL (T-SQL) with the ScriptDom parser | Azure SQL Devs’ Corner (microsoft.com)
[Update 20 March 2019: this functionality is now natively provided in SQL Server 2017+ / Azure SQL DB through the STRING_AGG intrinsic]
Some time back, Dimitri Furman who works as a Senior Consultant in the Microsoft Services team shared a very useful tip. It pertains to a pattern of usage wherein strings from multiple rows are concatenated into a single large (typically comma delimited) string. The usual way that we see this being accomplished is by code like the below:
DECLARE @res NVARCHAR(max) SELECT @res = COALESCE(@res + ',', '') + FirstName + LastName FROM AdventureWorks2016CTP3.Person.Person SELECT @res
While the above is quite simple to read, it is quite slow (as you will see soon) and worse – it does not ‘inline’ meaning you cannot use the above inside a inline table-valued function. This last property was very important for us, because in the case we were looking at, the above COALESCE based concatenation code was in a scalar UDF, which clearly is a big ‘no-no’ from a performance perspective. Also, we did not want to utilize SQLCLR so the usage of the GROUP_CONCAT CLR UDA was ruled out.
Now, here is a much better way to refactor the above concatenation ‘loop’ into the below:
SELECT STUFF(( SELECT ',' + FirstName + LastName FROM AdventureWorks2016CTP3.Person.Person FOR XML PATH('') ,TYPE ).value('.', 'nvarchar(max)'), 1, 1, '') AS RawResult
On first glance, it may seem counter-intuitive that the second version could be faster, but that is indeed the case. For the above queries, here is the comparison:
COALESCE loop XML Path CPU time = 7969 ms, elapsed time = 7984 ms CPU time = 15 ms, elapsed time = 16 ms
The usage of XML PATH in this way is not exactly ‘breaking news’ but the performance difference is dramatic. And more importantly for the case we were discussing at work, the XML PATH based code can be inlined, making it suitable for use in TVFs. And in our real case, that’s exactly what we wanted to do (refactor the UDF into a TVF.)
Have a great week ahead!
This week I happened to assist with an internal application where there was a performance issue with a specific query. One of my observations with that query was that it was doing self-joins in order to determine the latest iteration of a specific record. That’s a common anti-pattern which can be fixed by using T-SQL’s windowing functions, such as ROW_NUMBER() or RANK().
The Inefficient Way
To give more context, here is an example from the Contoso Retail DW database. The requirement (very similar to the actual customer case) here is to obtain a list of all the customers who have placed an order with us, and for each of those customers, obtain the date of the latest order. The implementation that I first noticed was using a self-join as given below; notice the special case handling where the customer has 2 orders on the same day and then the order number (the OnlineSalesKey column here) becomes the tie-breaker:
SELECT l.CustomerKey ,l.DateKey AS LatestOrder FROM dbo.FactOnlineSales l LEFT JOIN dbo.FactOnlineSales r ON l.CustomerKey = r.CustomerKey AND ( l.DateKey < r.DateKey OR ( l.DateKey = r.DateKey AND l.OnlineSalesKey > r.OnlineSalesKey ) ) WHERE r.DateKey IS NULL
This query is *extremely* inefficient, burning 100% CPU on my i7 laptop with 8 logical CPUs! On my laptop it will run for well over 10 minutes before I get impatient and cancel. For reference, here is the estimated execution plan for the above query (scroll to the right and note the query cost of 167692).
Just for kicks, I ran the above query with MAXDOP 144 (NOT a good idea, but this was for fun) on our lab machine which has 144 logical CPUs and here is the picture I see there 🙂 Obviously NOT something you want in production!
Rewriting the query
Now, the efficient way to re-write this is to use Ranking Functions in T-SQL. These have been around a while (SQL 2005 actually!) but I feel they are under-utilized. Here is the query re-written using the ROW_NUMBER() function. This solution also elegantly takes care of the above tie-breaking logic which required the disjunction (OR predicate) in the previous query.
WITH NumberedOrders AS ( SELECT CustomerKey ,Datekey ,ROW_NUMBER() OVER ( PARTITION BY CustomerKey ORDER BY Datekey DESC ,OnlineSalesKey DESC ) AS RowNumber FROM FactOnlineSales ) SELECT CustomerKey ,Datekey FROM NumberedOrders WHERE RowNumber = 1
Here is the new execution plan. Note that the cost is also much lesser: 447.
Here are the execution statistics of this query, it completes in 43 seconds compared the self-join approach taking forever!
- Table ‘FactOnlineSales’. Scan count 9, logical reads 92516, physical reads 0, read-ahead reads 72663, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
- Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 31263, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
- SQL Server Execution Times:
- CPU time = 19231 ms, elapsed time = 43834 ms.
Moral of the story: next time you see query patterns involving self joins to achieve this kind of ‘latest record of a particular type’ take a pause and see if T-SQL Ranking Functions can help!
We just made SQL Server 2016 CTP 3.3 publicly available a few days ago, and it has some new features which are also reflected in the managed T-SQL parser (Microsoft.SqlServer.TransactSql.ScriptDom). As readers of this blog may know, I am a big fan of this parser and to see it keeping pace with the latest preview release of the SQL Server engine feels great!
Here are the main changes you can leverage if you are using the SQL Server 2016 CTP 3.3 ScriptDom parser:
- The ScriptDom parser is up to date with the new support for delayed compression of delta rowgroups in Columnstore indexes: the new COMPRESSION_DELAY option in ALTER INDEX helps if you are using non-clustered indexes in a real-time analytics scenario. See this article for more details.
- The RSA_3072, RSA_4096 algorithms are now added as algorithms inside the ScriptDom parser. These are typically used in the CREATE ASYMMETRIC KEY syntax.
I also wanted to inform anyone who is planning to upgrade their previous installation of SQL Server 2016 preview (CTP 3.2 typically) to the release notes, which contain a very important step (if you are using FILESTREAM) you must take BEFORE upgrading to CTP 3.3. So please take a minute and check the release notes prior to upgrading!
Hope this helps!