[Update 20 March 2019: this functionality is now natively provided in SQL Server 2017+ / Azure SQL DB through the STRING_AGG intrinsic]
Some time back, Dimitri Furman who works as a Senior Consultant in the Microsoft Services team shared a very useful tip. It pertains to a pattern of usage wherein strings from multiple rows are concatenated into a single large (typically comma delimited) string. The usual way that we see this being accomplished is by code like the below:
DECLARE @res NVARCHAR(max) SELECT @res = COALESCE(@res + ',', '') + FirstName + LastName FROM AdventureWorks2016CTP3.Person.Person SELECT @res
While the above is quite simple to read, it is quite slow (as you will see soon) and worse – it does not ‘inline’ meaning you cannot use the above inside a inline table-valued function. This last property was very important for us, because in the case we were looking at, the above COALESCE based concatenation code was in a scalar UDF, which clearly is a big ‘no-no’ from a performance perspective. Also, we did not want to utilize SQLCLR so the usage of the GROUP_CONCAT CLR UDA was ruled out.
Now, here is a much better way to refactor the above concatenation ‘loop’ into the below:
SELECT STUFF(( SELECT ',' + FirstName + LastName FROM AdventureWorks2016CTP3.Person.Person FOR XML PATH('') ,TYPE ).value('.', 'nvarchar(max)'), 1, 1, '') AS RawResult
On first glance, it may seem counter-intuitive that the second version could be faster, but that is indeed the case. For the above queries, here is the comparison:
COALESCE loop XML Path CPU time = 7969 ms, elapsed time = 7984 ms CPU time = 15 ms, elapsed time = 16 ms
The usage of XML PATH in this way is not exactly ‘breaking news’ but the performance difference is dramatic. And more importantly for the case we were discussing at work, the XML PATH based code can be inlined, making it suitable for use in TVFs. And in our real case, that’s exactly what we wanted to do (refactor the UDF into a TVF.)
Have a great week ahead!