Unofficial SQL ScriptDom change log

Disclaimer: this is an unofficial change log – it is provided AS-IS and should not be in any way considered as an official Microsoft statement.

I have been blogging about the ScriptDom parser for Microsoft (and Azure) SQL family of products and services. While the library is formally documented, some external guidance is needed, so I have added some code samples in the past to help get started. Apart from my posts here, there are some great blog posts like this ongoing series from Mala, and my own article on the Azure SQL dev blog.

ScriptDom evolves along with the product – as we add new T-SQL syntax to Azure SQL DB, Synapse SQL pools, or SQL Server, the ScriptDom library is usually updated. A convenient ship vehicle (for these updates to ScriptDom) is the DacFx NuGet. While some updates to the library are internal fixes, the ones that interest me (and probably most readers) the most are when there are new classes / properties / enumerations added to ScriptDom to keep up with the underlying changes to the T-SQL grammar and / or new capabilities, in the MS SQL family of products.

It is important to know about the DacFx repo on GitHub. While DacFx overall, and ScriptDom specifically, are still closed source, the DacFx repo does open up a direct line of communication to the concerned developers at Microsoft. What would be great is to have an official change log there for the changes to DacFx / ScriptDom. I have logged a request for the DacFx to consider doing this, but currently there is none that I know of, at least not to the level of detail that I described earlier.

Till such time that we have a detailed, fully updated change log, this blog post is being written as an unofficial change log for ScriptDom at least. Hopefully it will help readers understand when certain T-SQL grammar was added to ScriptDom, etc. I hope to keep it updated as we have later releases of DacFx and thereby, ScriptDom. If you have questions / feedback for me, please do leave a comment in this blog post and I will try to address it in due course of time.

Note that if there are no functionality changes for a given DacFx release, it will not feature in this table.

And again, this is an unofficial change log, so it is provided as-is and should not be in any way construed as an official Microsoft statement.

Note: the table below has the most recent changes at the end.

DacFx NuGet version Change Log Commentary
150.4897.1 Added:
* ClassifierEndTimeOption with members: Time, OptionType
* ClassifierImportanceOption with members: Importance, OptionType
* ClassifierMemberNameOption with members: MemberName, OptionType
* ClassifierStartTimeOption with members: Time, OptionType
* ClassifierWlmContextOption with members: WlmContext, OptionType
* ClassifierWlmLabelOption with members: WlmLabel, OptionType
* ClassifierWorkloadGroupOption with members: WorkloadGroupName, OptionType
* CreateWorkloadClassifierStatement with members: ClassifierName, Options
* DropWorkloadClassifierStatement with members: Name, IsIfExists
* WlmTimeLiteral with members: TimeString
* WorkloadClassifierOption with members: OptionType
* WorkloadClassifierOptionType with members: value__, WorkloadGroup, MemberName, WlmLabel, WlmContext, StartTime, EndTime, Importance
* WorkloadClassifierStatement with members: ClassifierName, Options
Changes related to Azure Synapse Analytics workload management (CREATE WORKLOAD CLASSIFIER; DROP WORKLOAD CLASSIFIER etc.)
150.4926.2-preview Added:
* CreateExternalStreamingJobStatement with members: Name, Statement
* CreateExternalStreamStatement with members: Name, Location, InputOptions, OutputOptions, ExternalStreamOptions
* DropExternalStreamingJobStatement with members: Name, IsIfExists
* DropExternalStreamStatement with members: Name, IsIfExists
* ExternalStreamingJobStatement with members: Name, Statement
* ExternalStreamLiteralOrIdentifierOption with members: Value, OptionKind
* ExternalStreamOption with members: OptionKind
* ExternalStreamOptionHelper with members:
* ExternalStreamOptionKind with members: value__, DataSource, FileFormat
* ExternalStreamStatement with members: Name, Location, InputOptions, OutputOptions, ExternalStreamOptions

Changed:

* DatabaseOptionKind has new properties: DataRetention
* ExternalFileFormatType has new properties: JSON

SQL Edge streaming:CREATE EXTERNAL STREAM etc.

Also adds JSON as an EXTERNAL FILE FORMAT.

150.4946.1-preview Added:
* DataRetentionTableOption with members: OptionState, FilterColumn, RetentionPeriod, OptionKind

Changed:

* TableOptionKind has new properties: DataRetention

Data retention options for SQL Edge.
150.5078.1-preview Added:
* AlterExternalLibraryStatement with members: Owner, Name, Language, ExternalLibraryFiles
* CreateExternalLibraryStatement with members: Owner, Name, Language, ExternalLibraryFiles
* DropExternalLibraryStatement with members: Name, Owner
* ExternalLibraryFileOption with members: Content, Path, Platform
* ExternalLibraryStatement with members: Name, Language, ExternalLibraryFiles

Changed:

* AdHocTableReference has new properties: ForPath
* BuiltInFunctionTableReference has new properties: ForPath
* BulkOpenRowset has new properties: ForPath
* ChangeTableChangesTableReference has new properties: ForPath
* ChangeTableVersionTableReference has new properties: ForPath
* DataModificationTableReference has new properties: ForPath
* FullTextTableReference has new properties: ForPath
* GlobalFunctionTableReference has new properties: ForPath
* InlineDerivedTable has new properties: ForPath
* InternalOpenRowset has new properties: ForPath
* OpenJsonTableReference has new properties: ForPath
* OpenQueryTableReference has new properties: ForPath
* OpenRowsetTableReference has new properties: ForPath
* OpenXmlTableReference has new properties: ForPath
* PivotedTableReference has new properties: ForPath
* QueryDerivedTable has new properties: ForPath
* SchemaObjectFunctionTableReference has new properties: ForPath
* SemanticTableReference has new properties: ForPath
* SqlScriptGeneratorOptions has new properties: AllowExternalLibraryPaths
* TableReferenceWithAlias has new properties: ForPath
* TableReferenceWithAliasAndColumns has new properties: ForPath
* UnpivotedTableReference has new properties: ForPath
* VariableMethodCallTableReference has new properties: ForPath
* VariableTableReference has new properties: ForPath

CREATE EXTERNAL LIBRARY, DROP EXTERNAL LIBRARY etc. (SQL Server ML Services.The ForPath changes are related to SQL Graph query, specifically when SHORTEST_PATH is used, then the non-anchor tables need to marked as FOR PATH.
150.5084.2 Removed:
* AlterExternalLibraryStatement
* CreateExternalLibraryStatement
* DropExternalLibraryStatement
* ExternalLibraryFileOption
* ExternalLibraryStatement

Changed:

* SqlScriptGeneratorOptions has deleted properties: AllowExternalLibraryPaths

It is rare that Microsoft removes functionality from ScriptDom, but that can happen for features which are in preview and therefore changing all the time. In this reelase, it looks like some of the previous changes in 150.5078.1-preview related to external library management were removed in this release. They make a re-appearance soon after though in 150.5141.1-preview. The older release being a preview one, it is perhaps not surprising that the interim changes did not make it through to this stable release.
150.5132.1-preview Added:
* PredictRunTimeOptions with members: ONNX
* PredictTableReference with members: ModelVariable, ModelSubquery, DataSource, RunTime, SchemaDeclarationItems, Alias, ForPath

Changed:

* FromClause has new properties: PredictTableReference

PREDICT support for ONNX runtime as well as being able to recognize the PREDICT TVF in the context of a FROM clause.
150.5141.1-preview Added:
* AlterExternalLanguageStatement with members: Platform, Operation, Owner, Name, ExternalLanguageFiles
* AlterExternalLibraryStatement with members: Owner, Name, Language, ExternalLibraryFiles
* CreateExternalLanguageStatement with members: Owner, Name, ExternalLanguageFiles
* CreateExternalLibraryStatement with members: Owner, Name, Language, ExternalLibraryFiles
* DropExternalLanguageStatement with members: Name, Owner
* DropExternalLibraryStatement with members: Name, Owner
* ExternalLanguageFileOption with members: Content, FileName, Path, Platform, Parameters, EnvironmentVariables
* ExternalLanguageStatement with members: Name, ExternalLanguageFiles
* ExternalLibraryFileOption with members: Content, Path, Platform
* ExternalLibraryStatement with members: Name, Language, ExternalLibraryFiles

Changed:

* SqlScriptGeneratorOptions has new properties: AllowExternalLanguagePaths, AllowExternalLibraryPaths

The ML Services related external library statements make a comeback. In addition, we see CREATE EXTERNAL LANGUAGE and related statements for the first time in ScriptDom. This is related to the SQL Server Language Extensions feature.
150.5164.1 Removed:
* AlterExternalLanguageStatement
* AlterExternalLibraryStatement
* CreateExternalLanguageStatement
* CreateExternalLibraryStatement
* DropExternalLanguageStatement
* DropExternalLibraryStatement
* ExternalLanguageFileOption
* ExternalLanguageStatement
* ExternalLibraryFileOption
* ExternalLibraryStatement

Changed:

* SqlScriptGeneratorOptions has deleted properties: AllowExternalLanguagePaths, AllowExternalLibraryPaths

For the second time, we see a removal of the external library and external language related classes 😊
150.5282.3 Added:
* AlterExternalLanguageStatement with members: Platform, Operation, Owner, Name, ExternalLanguageFiles
* AlterExternalLibraryStatement with members: Owner, Name, Language, ExternalLibraryFiles
* CreateExternalLanguageStatement with members: Owner, Name, ExternalLanguageFiles
* CreateExternalLibraryStatement with members: Owner, Name, Language, ExternalLibraryFiles
* DropExternalLanguageStatement with members: Name, Owner
* DropExternalLibraryStatement with members: Name, Owner
* ExternalLanguageFileOption with members: Content, FileName, Path, Platform, Parameters, EnvironmentVariables
* ExternalLanguageStatement with members: Name, ExternalLanguageFiles
* ExternalLibraryFileOption with members: Content, Path, Platform
* ExternalLibraryStatement with members: Name, Language, ExternalLibraryFiles
* LedgerOption with members: OptionState, OptionKind
* LedgerTableOption with members: OptionState, AppendOnly, LedgerViewOption, OptionKind
* LedgerViewOption with members: ViewName, TransactionIdColumnName, SequenceNumberColumnName, OperationTypeColumnName, OperationTypeDescColumnName, OptionKind

Changed:

* DatabaseOptionKind has new properties: Ledger
* GeneratedAlwaysType has new properties: SequenceNumberEnd, SequenceNumberStart, TransactionIdEnd, TransactionIdStart
* IndexDefinition has new properties: IncludeColumns
* SqlScriptGeneratorOptions has new properties: AllowExternalLanguagePaths, AllowExternalLibraryPaths

ML Services related statements are back (again!) and also some new statements to support early releases of the SQL Ledger feature.
150.5290.2-preview Removed:
* LedgerOption
* LedgerTableOption
* LedgerViewOption

Changed:

* DatabaseOptionKind has deleted properties: Ledger
* GeneratedAlwaysType has deleted properties: SequenceNumberEnd, SequenceNumberStart, TransactionIdEnd, TransactionIdStart
* IndexDefinition has deleted properties: IncludeColumns

Ledger options got seem to have been removed for now in this release. Again this is not uncommon as both the feature itself (Ledger) and the DacFx version are both preview.
150.5332.2-preview Added:
* LedgerOption with members: OptionState, OptionKind
* LedgerTableOption with members: OptionState, AppendOnly, LedgerViewOption, OptionKind
* LedgerViewOption with members: ViewName, TransactionIdColumnName, SequenceNumberColumnName, OperationTypeColumnName, OperationTypeDescColumnName, OptionKind

Changed:

* DatabaseOptionKind has new properties: Ledger
* GeneratedAlwaysType has new properties: SequenceNumberEnd, SequenceNumberStart, TransactionIdEnd, TransactionIdStart
* IndexDefinition has new properties: IncludeColumns

Ledger related options are back again.
160.5196.4-preview Added:
* Sql160ScriptGenerator with members: Options
* TSql160LexerInternal with members: CurrentOffset
* TSql160Parser with members: QuotedIdentifier
* TSql160ParserBaseInternal with members: FragmentFactory, PhaseOne
* TSql160ParserInternal with members: FragmentFactory, PhaseOne
* WindowClause with members: WindowDefinition
* WindowDefinition with members: WindowName, RefWindowName, Partitions, OrderByClause, WindowFrameClause

Removed:
* LedgerOption
* LedgerTableOption
* LedgerViewOption

Changed:
* DatabaseOptionKind has deleted properties: Ledger
* GeneratedAlwaysType has deleted properties: SequenceNumberEnd, SequenceNumberStart, TransactionIdEnd, TransactionIdStart
* IndexDefinition has deleted properties: IncludeColumns
* OverClause has new properties: WindowName
* QuerySpecification has new properties: WindowClause
* SqlScriptGeneratorOptions has new properties: NewLineBeforeWindowClause
* SqlVersion has new properties: Sql160
* SqlVersionFlags has new properties: TSql160, TSql160AndAbove, TSqlUnder160

The Sql160 family makes its first appearance. This means the TSql160Parser is available and can and should be used to parse the latest and greatest syntax. Also looks like some new SqlVersion enumeration values got added corresponding to 160.We also see the first preview appearance of the window functions enhancements for SQL Server 2022 like the WINDOW clause.
160.5323.3-preview Added:
* LedgerOption with members: OptionState, OptionKind
* LedgerTableOption with members: OptionState, AppendOnly, LedgerViewOption, OptionKind
* LedgerViewOption with members: ViewName, TransactionIdColumnName, SequenceNumberColumnName, OperationTypeColumnName, OperationTypeDescColumnName, OptionKind
* OpenRowsetColumnDefinition with members: ColumnOrdinal, JsonPath, ColumnIdentifier, DataType, Collation
* OperatorAuditOption with members: Value, OptionKind

Changed:

* AuditOptionKind has new properties: OperatorAudit
* BulkInsertOptionKind has new properties: DataCompression, HeaderRow, ParserVersion, RowsetOptions
* BulkOpenRowset has new properties: WithColumns
* DatabaseOptionKind has new properties: Ledger
* GeneratedAlwaysType has new properties: SequenceNumberEnd, SequenceNumberStart, TransactionIdEnd, TransactionIdStart
* IndexDefinition has new properties: IncludeColumns

The SQL Ledger related T-SQL syntax, such as append-only ledger tables, are reflected in this release of ScriptDom.

The OperatorAudit is related to auditing for Microsoft initiated devops activities as described in this post.

Some of the WITH … (column) enhancements in OPENROWSET to allow specifying column metadata – this is specific to Azure Synapse Serverless SQL Pools. There are other changes in ScriptDom to keep up with the OPENROWSET support in Azure Synapse Serverless SQL Pools. See the docs for more details. In this process, I learnt that the parser version 2.0 for Synapse offers a higher degree of control for parsing files, also there is a way to specify that you are reading GZIP compressed files, etc.

What’s new in the Server 2016 CTP 3.3 ScriptDom Parser

We just made SQL Server 2016 CTP 3.3 publicly available a few days ago, and it has some new features which are also reflected in the managed T-SQL parser (Microsoft.SqlServer.TransactSql.ScriptDom). As readers of this blog may know, I am a big fan of this parser and to see it keeping pace with the latest preview release of the SQL Server engine feels great!

Here are the main changes you can leverage if you are using the SQL Server 2016 CTP 3.3 ScriptDom parser:

  • The ScriptDom parser is up to date with the new support for delayed compression of delta rowgroups in Columnstore indexes: the new COMPRESSION_DELAY option in ALTER INDEX helps if you are using non-clustered indexes in a real-time analytics scenario. See this article for more details.
  • The RSA_3072, RSA_4096 algorithms are now added as algorithms inside the ScriptDom parser. These are typically used in the CREATE ASYMMETRIC KEY syntax.

I also wanted to inform anyone who is planning to upgrade their previous installation of SQL Server 2016 preview (CTP 3.2 typically) to the release notes, which contain a very important step (if you are using FILESTREAM) you must take BEFORE upgrading to CTP 3.3. So please take a minute and check the release notes prior to upgrading!

Hope this helps!

Changes in SQL 2014 RTM ScriptDom

It has been over 3 months now that SQL Server 2014 is generally available. The well-known and much talked about features such as in-Memory OLTP and Clustered ColumnStore indexes are big motivations to move your workload to the new platform.

With the new features, the T-SQL language surface has also changed of course. Under compatibility level 120, there are new grammar elements to consider. I recently went through one such round of impact analysis, specifically around the ScriptDom parser.

The good news is that the move from SQL 2012 ScriptDom to the latest one is very incremental and almost all of the changes are only due to new syntax elements. I prepared a list of these statements (as depicted in the parser) for your convenience.

 

In-Memory OLTP support

No coincidence that the changes related to in-Memory OLTP are the biggest! This section also includes related support such as ATOMIC block statements, delayed durability, inline index definitions etc.

 

Buffer Pool Extension, Resource Governor, HADR and FCI related Server DDL

This set of changes has to do with the support for configuring the BPE, I/O parameters in Resource Governor, and some minor but important DDL to configure HADR availability groups and FCI instances from T-SQL itself.

 

Clustered ColumnStore related

This group includes the index DDL and Archive compression related changes.

 

Incremental Statistics related

 

Managed Lock Priority related

Managed lock priority changes extend to quite a few operations such as index maintenance, partition switch etc.

 

Backup Encryption + Security DDL

Here you notice the changes to support encryption while backing up; and also the new encryption algorithm enumeration to support the same. We also have several new event notifications available.

 

Miscellaneous

In conclusion, while I believe I have covered all the changes (and I’m glad there seem to be none which are BREAKING changes as such) – if you still find something which has been added or changed and I did not mention it in this post, please feel free to leave a comment below!

Normalizing T-SQL text, part 2: using the TransactSql.ScriptDom classes

Happy New Year 2014 to all of you! With SQL Server 2014 set to release this year, I’m sure you are all excited about the months to come.

In my previous post I had reviewed the classic ways of obtaining ‘normalized’ text for ad-hoc SQL queries. Do take a minute to glance at that post in case you have not already done so. Both the methods described previously are dynamic – they need an active workload to operate upon. So if you have a static set of queries captured somewhere – such as a .SQL file or such, then we need an alternate method.

Algorithm

If you think about it, the core of normalizing these ad-hoc query text patterns is to identify literals and replace then with a generic / common value. Once the specific literal values are replaced with their generic ‘placeholders’ then it becomes a relatively easy task to identify commonalities.

To identify commonalities we propose to use a hashing algorithm, conceptually similar to the one used in the previous approaches. However, when computing hashes, there is another problem to consider: minor differences in whitespace / alphabet case of the query text will cause different hash values to be raised for essentially identical text.

ScriptDom Implementation

The first thing to consider is what kind of literals we would replace. In the ScriptDom class hierarchy, we have the following derived classes for the parent Literal class:

  • IntegerLiteral: whole numbers
  • NumericLiteral: decimal numbers such as 0.03
  • RealLiteral: numbers written with scientific notation such as 1E-02
  • MoneyLiteral: values prefixed with currency symbol such as $12
  • BinaryLiteral: such as 0xABCD1234
  • StringLiteral: such as ‘Hello World’
  • NullLiteral: the NULL value
  • DefaultLiteral: the DEFAULT keyword
  • MaxLiteral: the MAX specifier
  • OdbcLiteral: ODBC formatted literal such as { T ‘blah’ }
  • IdentifierLiteral: ‘special’ case when an identifier is used as a literal. I’ve never seen a real world example of this Smile

We need to keep this in mind when we write the visitor to traverse through the AST.

Visitor definition

Next, we need to setup our visitor. We will use the Visitor pattern to do this, and implement overridden methods to handle the various types of literals described above. And for each type of literal we will replace the value of the literal with a fixed, generic value. Here is an example for the real literal visitor method:

public override void ExplicitVisit(RealLiteral node)
{
node.Value = “0.5E-2”;
base.ExplicitVisit(node);
}

Visitor invocation

For performance reasons we will call the visitor with the Parallel.ForEach loop which makes efficient use of multi-core CPUs:

Parallel.ForEach(
(frag as TSqlScript).Batches,
batch =>
{
myvisitor visit = new myvisitor();

                        batch.Accept(visit);

This way, each T-SQL batch in the original script is visited on a separate thread.

Regenerate the Script

Once the visitor does its job to ‘normalize’ the various literals encountered, the next step is to generate the script based on the tokens already obtained. That will take care of one of the 2 problems we spoke about – namely whitespace. We can do that using one of the many SqlScriptGenerator classes available (there is one for each compatibility level.) In the code snippet below, srcgen is one of the SqlScriptGenerator classes and script holds the output:

scrgen.GenerateScript(batch, out script);

Calculate the hash

Once the normalized script is re-generated from the SqlScriptGenerator class, it can then be run through a proper hash algorithm (in this sample we use SHA1) to calculate the hash value of the given script. Here is where we also handle the case sensitive / insensitive nature of the script:

  • For case insensitive cases, we simply convert the generated script to lower case before we compute the hash.
  • For case sensitive, we calculate the hash as-is on the generated script.

using (var hashProvider = new SHA1CryptoServiceProvider())
{
if (caseSensitive)
{
hashValue = Convert.ToBase64String(hashProvider.ComputeHash(Encoding.Unicode.GetBytes(script)));
}
else
{
hashValue = Convert.ToBase64String(hashProvider.ComputeHash(Encoding.Unicode.GetBytes(script.ToLowerInvariant())));
}
}

Track unique hashes

We can use a simple Dictionary class in .NET to track these, along with usage counts for each bucket. Each bucket also tracks an example of the batch (the original text itself.)

Sample output

The sample project when compiled and executed as below gives us the following output.

Command line

TSQLTextNormalizer.exe c:tempinput.sql c:tempoutput.sql 110 false

Input

select * from ABC
where
i = 1
GO

select * from abC where i = 3
GO

Output

— 2 times:
select * from ABC
where
i = 1
GO

That’s it! You can use this in many cases, limited only by your imagination Smile And more importantly I hope it showed you the power and flexibility of the ScriptDom classes.

[EDIT 2022-07-27] The sample code for this post is now on GitHub. Please upvote the repo on GitHub if you liked this post and found the sample code useful. The URL is https://github.com/arvindshmicrosoft/SQLScriptDomSamples/tree/main/8_TSQLTextNormalizer

Disclaimer

This Sample Code is provided for the purpose of illustration only and is not intended to be used in a production environment.  THIS SAMPLE CODE AND ANY RELATED INFORMATION ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE.  We grant You a nonexclusive, royalty-free right to use and modify the Sample Code and to reproduce and distribute the object code form of the Sample Code, provided that You agree: (i) to not use Our name, logo, or trademarks to market Your software product in which the Sample Code is embedded; (ii) to include a valid copyright notice on Your software product in which the Sample Code is embedded; and (iii) to indemnify, hold harmless, and defend Us and Our suppliers from and against any claims or lawsuits, including attorneys’ fees, that arise or result from the use or distribution of the Sample Code.

This posting is provided “AS IS” with no warranties, and confers no rights. Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.

Normalizing T-SQL text, part 1: using the RML Utilities and the DMVs

A common problem when dealing with workloads which issue ad-hoc SQL commands (i.e. without parameterization) is to find out the ‘normalized’ version of the pattern. For example, these three statements are essentially the same ‘template’:

  • SELECT BusinessEntityId FROM Person.Person WHERE LastName = ‘Smith’
  • SELECT BusinessEntityId FROM Person.Person WHERE LastName = ‘Singh’
  • SELECT BusinessEntityId FROM Person.Person WHERE LastName = ‘Lee’

And so are these three, except that they are a distinctly new pattern:

  • SELECT * FROM Person.person WHERE LastName IN (‘Smith’)
  • SELECT * FROM Person.person WHERE LastName IN (‘Smith’, ‘Singh’)
  • SELECT * FROM Person.person WHERE LastName IN (‘Smith’, ‘Singh’, ‘Lee’)

The requirement therefore is to detect only the common ‘template’ / ‘normalized’ text – in most cases a representative example of the ad-hoc statement. In this post, I will show you the classic ways to do that: using the RML Utilities (specifically using ReadTrace) or using the query_hash value in the DMVs.

Option 1: Using the RML Utilities / ReadTrace

The classic way to do with profiler traces of course, was to capture a trace file (using server side tracing preferably) and then run that trace through the ReadTrace utility. The neat thing about this utility is that it would ‘aggregate’ similar patterns into hash buckets and then display statistics for each hash ID. To do that successfully, it requires a SQL instance and database.

Here is a sample command line for using this utility:

readtrace -Ic:temptest.trc -S. -dTest -f

The above imports the trace (from c:temptest.trc) into a database called Test on my local default instance of SQL 2012. Once that is done, you can work your magic on the various tables which are created by this utility and extract the unique batches:

SELECT OrigText, HashID
FROM ReadTrace.tblUniqueBatches

Here is the sample output from my test, where I had issued those 6 statements (each delimited by a GO) which were shown at the beginning of this post:

  • SELECT BusinessEntityId FROM Person.Person WHERE LastName = ‘Smith’ => hash value of -8759399170292131263
  • select * from Person.person where LastName IN (‘SMith’) => hash value of 1644287078950160061

So the good news is that this utility correctly found the underlying patterns very accurately, but it also requires extra efforts (to provision a SQL instance and gather a trace file.) One other advantage with the ReadTrace utility is that it also supports importing Extended Event Logs (XEL) files as well (though that feature is still in Beta.)

For more details on the normalization capabilities within ReadTrace, Download RML Utilities for SQL Server (x64) and then read the PDF help file. The various patterns are described in the section ‘Normalization Patterns’.

Option 2: Using the query_hash value

Starting SQL Server 2008 and above you can use the query_hash functionality to isolate these common patterns. Here is an example of a query which can do that and list one sample query for each ‘pattern’:

select * from
(select ST.text, ROW_NUMBER() OVER (PARTITION BY query_hash ORDER BY query_hash) Ranking from sys.dm_exec_query_stats QS
cross apply sys.dm_exec_sql_text (QS.sql_handle) ST
where objectid IS NULL
and ST.encrypted = 0
) AS Q
where Q.Ranking = 1

While this is easy to use and available out of the box, it does not work for SQL 2005 or obviously not for SQL 2000 (why do I even bring up SQL 2000, given that it is out of support? When upgrading those few workloads from SQL 2000 to higher versions, it is appropriate to tune some of this stuff by replacing them with parameterized queries.)

Also, there is another problem with IN predicates. When there is a varying set of literals passed to the IN predicate, it turns out that the query_hash is different. Here is an example:

  • select * from Person.person where LastName IN (‘SMith’, ‘Singh’, ‘Lee’)  => hash value 0x6FAFADDF37AA7683
  • select * from Person.person where LastName IN (‘SMith’) => hash value 0x77A1391010F61AE7
  • select * from Person.person where LastName IN (‘SMith’, ‘Singh’) => hash value 0x9BEA60D8D6595AA5

Summary

I’d like to quickly compare these two approaches and then setup what’s coming next.

ReadTrace query_hash
  • Offline method
  • Can consume TRC or XEL files
  • Supports SQL Server versions 2000, 2005 and 2008, 2008 R2, 2012 and SQL 2014 CTP2 (version  9.04.0004 as of time of writing this post)
  • Correctly handles many patterns including IN lists
  • Online method
  • Requires SQL 2008 or above
  • Dependent on the query plan cache
  • Some limitations such as not aggregating IN predicate lists correctly

In the next part I will show you how you can cook your own ‘aggregator’ / ‘normalizer’ using the Transact-SQL ScriptDom parser and then compare it with these two approaches. Stay tuned!

Using the TransactSql.ScriptDOM parser to get statement counts

Today there was a question on the #sqlhelp Twitter hashtag: “Is it possible to get UPDATE / SELECT / INSERT / DELETE statement counts from SQL Server?”

Implementation

This is a perfect use case for the SQLDOM parser a.k.a. Microsoft.SqlServer.TransactSql.ScriptDom. I put together a sample C# application to demonstrate how easy it is for requirements like the above. The assumption made below (for simplicity) is to parse the input from a .SQL script file. In the real world, we can easily script the object code from a live connection (using SMO for example) and pass it as a MemoryStream to the TextReader below. But that is left, as they say, to the reader as an intellectual exercise Smile

Test case

Here is the sample code, and interestingly if you run it on a script of AdventureWorks, it tells you that there is not a single DELETE statement embedded in any of the sample database’s Stored Procedures, Views or Functions (which is correct and it is a minor eye-opener as well Smile).

Found 37 SELECTs, 1743 INSERTs, 15 UPDATEs & 0 DELETEs

Conclusion

The ScriptDom excels at simple tasks like this, but do keep in mind some complications which need to be considered for more advanced requirements. I’ll be posting some tips on how to overcome those complications later on.

Sample Code
using System; 
using System.Collections.Generic; 
using System.Text; 
using Microsoft.SqlServer.TransactSql.ScriptDom; 
using System.IO;
namespace SampleSQLVisitor 
{ 
    class Program 
    { 
        static void Main(string[] args) 
        { 
            TextReader txtRdr = new StreamReader("myscriptfile.sql"); 
            TSql110Parser parser = new TSql110Parser(true);
            IList<ParseError> errors; 
            TSqlFragment sqlFragment = parser.Parse(txtRdr, out errors); 
             // TODO report the parsing errors generated (if any)
            SQLVisitor myVisitor = new SQLVisitor(); 
            sqlFragment.Accept(myVisitor);
            myVisitor.DumpStatistics(); 
        } 
    }
    internal class SQLVisitor : TSqlFragmentVisitor 
    { 
        private int SELECTcount = 0; 
        private int INSERTcount = 0; 
        private int UPDATEcount = 0; 
        private int DELETEcount = 0;
        private string GetNodeTokenText(TSqlFragment fragment) 
        { 
            StringBuilder tokenText = new StringBuilder(); 
            for (int counter = fragment.FirstTokenIndex; counter <= fragment.LastTokenIndex; counter++) 
            { 
                tokenText.Append(fragment.ScriptTokenStream[counter].Text); 
            }
            return tokenText.ToString(); 
        }
        // SELECTs 
        public override void ExplicitVisit(SelectStatement node) 
        { 
            //Console.WriteLine("found SELECT statement with text: " + GetNodeTokenText(node)); 
            SELECTcount++; 
        }
        // INSERTs 
        public override void ExplicitVisit(InsertStatement node) 
        { 
            INSERTcount++; 
        }
        // UPDATEs 
        public override void ExplicitVisit(UpdateStatement node) 
        { 
            UPDATEcount++; 
        }
        // DELETEs 
        public override void ExplicitVisit(DeleteStatement node) 
        { 
            DELETEcount++; 
        }
        public void DumpStatistics() 
        { 
            Console.WriteLine(string.Format("Found {0} SELECTs, {1} INSERTs, {2} UPDATEs & {3} DELETEs", 
                this.SELECTcount, 
                this.INSERTcount, 
                this.UPDATEcount, 
                this.DELETEcount)); 
        } 
    } 
} 
Disclaimer

This Sample Code is provided for the purpose of illustration only and is not intended to be used in a production environment.  THIS SAMPLE CODE AND ANY RELATED INFORMATION ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE.  We grant You a nonexclusive, royalty-free right to use and modify the Sample Code and to reproduce and distribute the object code form of the Sample Code, provided that You agree: (i) to not use Our name, logo, or trademarks to market Your software product in which the Sample Code is embedded; (ii) to include a valid copyright notice on Your software product in which the Sample Code is embedded; and (iii) to indemnify, hold harmless, and defend Us and Our suppliers from and against any claims or lawsuits, including attorneys’ fees, that arise or result from the use or distribution of the Sample Code.

This posting is provided “AS IS” with no warranties, and confers no rights.

Considerations when using the TransactSql.ScriptDOM parsers

Some of you might be aware of the above namespace, which holds an implementation of a first-class T-SQL parser. In this post I would like to explain some of the complexity you will face when dealing with the ScriptDOM yourselves, typically using Visitor pattern.

Case Study

Our objective in this case is to use the parser and implement a rule to find expressions which have a leading wildcard in them, within a WHERE Clause. Firstly, let us consider the T-SQL statement below:

SELECT * FROM Person.Contact
WHERE LastName LIKE ‘%mith’

If you parse this using ScriptDOM you can visit the LikePredicate element, and you can then typecast the SecondExpression to StringLiteral and then check if it starts with %.

public override void ExplicitVisit(LikePredicate node)
        {
            if (node.SecondExpression is StringLiteral)
            {
                if ((node.SecondExpression as StringLiteral).Value.StartsWith("%"))
                {
                        Console.WriteLine((node.SecondExpression as StringLiteral).Value);
                }
            }
        }

Now consider this second case, which has a complex expression (string concatenation)

SELECT * FROM Person.Contact
WHERE LastName LIKE ‘%’ + @blah + ‘%’

In this case, if you try to reuse the ExplicitVisit code mentioned above, it will fail to detect the issue. Because, the SecondExpression member of the LikePredicate is now a BinaryExpression. This complicates the check:

if (node.SecondExpression is BinaryExpression)
            {
               if ((node.SecondExpression as BinaryExpression).SecondExpression is StringLiteral)
                {
                   if ((((node.SecondExpression as BinaryExpression).SecondExpression) as StringLiteral).Value.StartsWith("%"))
                    {
                          Console.WriteLine((((node.SecondExpression as BinaryExpression).SecondExpression) as StringLiteral).Value);
                    }
                }
            }

Conclusion

For arbitrary AST shapes, you can see that this is a very difficult issue to resolve in code. So while the ScriptDOM parser is a great tool, it does requires a fundamentally different approach to navigating the AST. Something to be kept in mind if you are attempting to use the parser in any industrial strength application.