January 21, 2015 by Arvind Shyamsundar

Exotic wait types: PREEMPTIVE_OS_ENCRYPTMESSAGE

Some time ago, we were discussing wait types with a group of customers. To demonstrate the waits-and-queues methodology we use, I showed the group the sys.dm_os_wait_stats DMV, and the results were sorted by wait_time_ms in descending order. On the top of that list was the PREEMPTIVE_OS_ENCRYPTMESSAGE wait type. At that moment I was not exactly sure what that wait type was and I promised to look into it later. This post is a quick explanation of the same.

It turns out that if the SQL client and / or server is configured to use SSL for secure communications, the implementation within SQL Server will use an Win32 API hosted within SspiCli.dll. Such wait states – where the SQLOS calls into an external DLL which SQL has no direct control over – are identified with the PREEMPTIVE_* label.

So in short, this wait type is representative of the time it takes to encrypt TDS traffic back to the client. Does it represent a problem if it is on top of the list? Not necessarily, and as with all wait types, it needs to be carefully examined within the context of the specific issue being investigated and co-related with other observations such as performance counters etc. In certain cases, if you strongly suspect that this is representative of a problem, you can always check if you have configured SSL for SQL Server or for the client, and test with that turned off (subject to the boundary security requirements of course!)

Till next time, have a good one!

March 20, 2014September 24, 2020 by Arvind Shyamsundar

Too many files causing RAM pressure? (a.k.a. DynCache to the rescue!)

Readers of this blog might recall a previous post which described one impact of having too many similarly named files in one folder in NTFS. It turns out that the 8.3 naming convention is not the only thing you need to worry about when you have very large amounts of (smaller) files in the same volume.

Today I was called in to assist with a performance issue on a server. The only visible symptom of the problem was excessive RAM utilization on the server. The interesting aspect was that no specific user-mode process was consuming that RAM, so we were wondering where it came from. If this server was running SQL Server, for example, the ‘ghost’ utilization could be due to locked pages (which do not show up in Task Manager) but that was not the case here.

Analysis

So we ran the RAMMap utility, we found that the usage for MetaFile was a substantial percentage of the total RAM usage. From this AskPerf blog post you can see what MetaFile is all about:

“Metafile is part of the system cache and consists of NTFS metadata. NTFS metadata includes the MFT as well as the other various NTFS metadata files (see How NTFS Works for more details, and of course Windows Internals is a great reference). In the MFT each file attribute record takes 1k and each file has at least one attribute record. Add to this the other NTFS metadata files and you can see why the Metafile category can grow quite large on servers with lots of files.”

The next step therefore was to cross-check how large the MFT was in reality. The easy way to do this is to utilize the command given below:

fsutil fsinfo ntfsinfo <drive letter>

A sample output is given below (from my own laptop Smile )

C:>fsutil fsinfo ntfsinfo c:
NTFS Volume Serial Number :       0x8a40c9ee40c9e0d5
NTFS Version   :                  3.1
LFS Version    :                  2.0
Number Sectors :                  0x000000003a2007ff
Total Clusters :                  0x00000000074400ff
Free Clusters :                  0x0000000000ab1f84
Total Reserved :                  0x000000000002ef10
Bytes Per Sector :               512
Bytes Per Physical Sector :       4096
Bytes Per Cluster :               4096
Bytes Per FileRecord Segment    : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length :           0x0000000030240000
Mft Start Lcn :                  0x00000000000c0000
Mft2 Start Lcn :                  0x0000000000000002
Mft Zone Start :                  0x0000000004d58da0
Mft Zone End   :                  0x0000000004d655c0
Resource Manager Identifier :     96CC88FE-5621-11E3-AF31-3C970EA47926

In this output, the “Mft Valid Data Length” gives us an indicator of how many bytes are used by the MFT. In the above case for example it equates to around 770MB:

0x0000000030240000 / (1024*1024) = 770.25 MB

Just as a curiosity, if I run RAMMap in my laptop (Windows 8.1), here is what I see, you can see a rough alignment with the above computed number.

FYI, in the server that I was looking at in the real world, the size of the MFT was actually 1.5 times the amount of RAM on the box Smile

Mitigation

Since the server in question was running Windows Server 2008 R2 SP1, we recommended the usage of the DynCache service sample, which would automatically control the size of the system cache based on system memory notifications.

SQL Server FileStream

Practically, this issue would also apply when you use SQL FileStream to store a very large number of blobs in the NTFS file system. In such cases, here are my recommendations:

Consider setting the SQL Server Database Engine to use ‘locked pages’ with ‘max server memory’ set appropriately.
If the OS is Windows 2008 R2 or below, you may additionally consider using the DynCache service to mitigate the effect that the large MFT will have.

Other notes

If you anticipate huge numbers of files to be stored on the file system, keep in mind that each file record will take up around 1KB. That means 100 million files will take up close to 100GB worth of MFT storage!

On Windows 2008 R2 and below be aware of issues like the one described in KB article 967351 and install the updated version of NTFS.sys accordingly.

In extreme cases, if you want to achieve optimal performance without compromising on system cache memory utilization, be aware that the system RAM sizing must be done accordingly keeping in mind the very large MFT which might result.

References

You may want to read these articles for more information:

February 24, 2014 by Arvind Shyamsundar

Tracking TEMPDB internal object space usage in SQL 2012

It is a documented fact that using the MAX specifier can result in TEMPDB usage in specific cases. From the TEMPDB whitepaper, some of these can use ‘internal object’ space within TEMPDB:

“all of the large object types: text, image, ntext, varchar(max), varbinary(max), and all others.”

Now, a FAQ from our customers is how to drill down to session level usage of TEMPDB when they suspect abnormal growth. Normally, the sys.dm_db_session_space_usage gives us the details we need. But are there any other options to see deeper? We wanted to test this and observe the effects. To do this, we created a simple stored procedure and a corresponding C# console application which passes in a large valued parameter.

Stored procedure code

create procedure testvarcharmax(@s varchar(max))
as
select @@version

C# application code

Here is the simple front-end application code which invokes the ‘testvarcharmax’ procedure:

SqlConnection cn = new SqlConnection("Data Source=.;initial catalog=tempdb;integrated security=true");
cn.Open();

SqlCommand cmd = new SqlCommand();
cmd.Connection = cn;
cmd.CommandType = System.Data.CommandType.StoredProcedure;
cmd.CommandText = "testvarcharmax";
SqlParameter parameter = new SqlParameter("s", System.Data.SqlDbType.VarChar, -1);
parameter.Value = new StringBuilder().Insert(0, "a", 1000000000).ToString();
cmd.Parameters.Add(parameter);
cmd.CommandTimeout = 360;

cmd.ExecuteNonQuery();

Monitoring TEMPDB

After executing the above C# code, my TEMPDB quickly grew by leaps and bounds. As any DBA would understand, the essential ask in such cases is to trace the session ID which is generating this TEMPDB space usage. One option is to use the sys.dm_db_file_space_usage DMV. That one is useful in this case, as it reports the aggregate space for the internal objects:

SELECT SUM(internal_object_reserved_page_count) AS [internal object pages used], (SUM(internal_object_reserved_page_count)*1.0/128) AS [internal object space in MB] FROM tempdb.sys.dm_db_file_space_usage;

However the above does not provide a session-level detail. To obtain that, you can use the queries on sys.dm_db_session_space_usage and sys.dm_db_task_space_usage as shown below:

select * from tempdb.sys.dm_db_session_space_usage
order by internal_objects_alloc_page_count desc

select * from tempdb.sys.dm_db_task_space_usage
order by internal_objects_alloc_page_count desc

Update: Adam Machanic pointed out that it is much easier and lightweight to use the sys.dm_db_task_space_usage and sys.dm_db_session_space_usage DMVs to track internal object usage to a given task / session. Please use the detailed approach described below only if you have a specific need to investigate lower-level details, such as a potential product issue or such.

Digging deeper

Suppose you suspect a ‘leak’ in TEMPDB objects of some sort, and you need to really peer deep into TEMPDB page allocation / deallocation. To peer more into the internals, KB article 947204 provides an interesting approach using the allocation ring buffer using trace flag 1106. This can be useful in rare cases, but first do note the important disclaimer about this flag:

Note We recommend that you do not use trace flag 1106 on production servers because trace flag 1106 affects the performance of SQL Server.

Note that the query specified therein does not return results on SQL 2012 even after enabling trace flag 1106. It turns out that the query requires a minor change, as the XML schema and the name for the ring buffer has changed in SQL 2012. The new name for the ring buffer is RING_BUFFER_SPACEMGR_TRACE. And here is the modified query, reflecting changes in the XML schema as well:

SELECT top 500
record.value(‘(Record/@id)[1]’, ‘int’) AS record_id,
CONVERT (varchar, DATEADD (ms, -1 * ((inf.cpu_ticks / inf.cpu_ticks) – [timestamp]), GETDATE()), 126) AS EventTime,
[timestamp] ,
record.value(‘(Record/@id)[1]’, ‘int’) AS RingBuffer_Record_Id,
record.value(‘(Record/SpaceMgr/Event)[1]’, ‘int’) AS AllocationEventType,
record.value(‘(Record/SpaceMgr/SpId)[1]’, ‘int’) AS SpId,
record.value(‘(Record/SpaceMgr/EcId)[1]’, ‘int’) AS EcId,
record.value(‘(Record/SpaceMgr/PageId)[1]’, ‘nvarchar(50)’) AS AllocatedPageId,
record.value(‘(Record/SpaceMgr/AuId)[1]’, ‘nvarchar(50)’) AS AllocationUnitId,
record.value(‘(Record/SpaceMgr/LsId)[1]’, ‘nvarchar(50)’) AS LsId
FROM sys.dm_os_sys_info inf CROSS JOIN (
SELECT timestamp, CONVERT (xml, record) AS record
FROM sys.dm_os_ring_buffers
WHERE ring_buffer_type = ‘RING_BUFFER_SPACEMGR_TRACE’
AND ( record LIKE ‘%<Event>23</Event>%’ — uniform extent allocation
OR record LIKE ‘%<Event>22</Event>%’ — uniform extent deallocation
OR record LIKE ‘%<Event>24</Event>%’ — mixed extent allocation
OR record LIKE ‘%<Event>25</Event>%’ — mixed extent deallocation
OR record LIKE ‘%<Event>10</Event>%’ — page allocation
OR record LIKE ‘%<Event>11</Event>%’ — page deallocation
)) AS t
ORDER BY record.value(‘(Record/@id)[1]’, ‘int’) ASC

The results of the ring buffer query are too detailed for most real-time scenarios and the overheads associated with the trace flag make it impractical, as compared to the sys.dm_db_session_space_usage DMV. But if you happen to use the approach in KB 947204 on SQL 2012 for some really low level debugging, please take note of the above changes.

Reference

Queries useful for troubleshooting TEMPDB space usage

January 28, 2014 by Arvind Shyamsundar

Indexed views with remote tables–possible?

Once upon a time, I had this question come up during an onsite visit: someone wanted to create a ‘materialized’ copy of the remote data locally as an indexed view. I had never heard that requirement before, but since the motivation was genuine, I decided to validate it.

There is a long list of what is permitted (and what is not) in indexed views. One fundamental requirement is that the view on which the index is created, must be schema-bound. And this fundamental requirement, actually nails the answer straight away: any attempt to create a schema-bound view which references a remote table ends up with the SQL error 2014:

Remote access is not allowed from within a schema-bound object

Now that it is clear that such ‘projection’ is not possible, we have to step back and understand the real problems why this requirement was posed. In most cases, such ideas come up for improving performance of distributed queries (DQ). For all such cases, I suggest you first start by reviewing the excellent talk from Conor on DQ.

But in many cases, especially where multiple remote data sources are involved, probably the best way to optimize the queries is to consolidate the data inside a data mart or warehouse, and then possibly consider the columnstore index (or the new Clustered Columnstore in SQL 2014) feature to optimize the aggregation-type queries which are very typical in those scenarios.

April 24, 2013 by Arvind Shyamsundar

SQL Server and Storage / SAN replication (mirroring) tips

Some customers have adopted block level replication of the storage disks (LUNs) as their DR solution for SQL Server databases. Remote mirroring of storage disks sounds great – transparent to the application, replicates every block as-is; so very little reconfiguration required on the other side, scales fairly linearly regardless of the number of databases and so on. Sounds better than baked bread, huh? Smile However, there are some finer points we would like to share from our experiences on the field.

Support Policy

Our KB articles sometimes read like legal documentation and rightly so – they are in many cases the ‘rulings’ and ‘bye-laws’ by which customers and us reach conclusions in tricky situations. So there are some KB articles which are a must-read in the case of storage.

KB 913945 states that Microsoft does not officially certify specific third party products will work with SQL Server. The most important part of that article is here:

Microsoft support documents may use the phrase "not supported by Microsoft" with regard to third-party products. This phrase means that the first line of support for such a product should be the third-party product vendor rather than Microsoft.

Next we come to KB 910716 which specifically talks about remote mirroring a.k.a. SAN replication. Some excerpts from this document:

This article does not endorse any specific vendor’s solution.
Work with your hardware vendor and your storage vendor to make sure that the remote mirroring solution that you want to use supports the I/O requirements of SQL Server.
If your solution meets the requirements that are discussed in this article, Microsoft Customer Support Services will provide support for issues that are not related to storage devices.
To obtain support from Microsoft Customer Support Services for issues that involve transactional throughput, you must be able to reproduce the issue in a mirror-free environment.
To obtain support for issues that involve a storage device, see your storage vendor.
To obtain support for issues that involve third-party replication technology, see your technology vendor.

Lastly we have KB 943984 which talks about using the storage mirroring to support a so-called geo-cluster. In such cases, remote disk mirroring is used to provide a consistent copy of the clustered storage disks on a remote site. This article has one very important line:

Please contact the data replication vendor for any issues related to accessing data on failover.

By now, you must be feeling very jittery about using such remote mirroring Smile But fear not, there are some basic precautions you can take to be sure.

SQL I/O Requirements

The fundamental issue is that the database and log files are now in two places rather than one. And that means that the basic I/O requirements that SQL Server assumes or requires are to be followed in both places. These requirements are described in the ages-old (but still relevant) SQL I/O Basics Part 1 document. (FYI, the Part 2 document is also a very good read) and are given below:

Stable Media: Any system running SQL Server must provide stable media capabilities for the database log and data files. If the system has something like a non-battery backed cache or perhaps caching enabled on disk, it is not safe for SQL Server installations. Ensure that your system has proper caching attributes to secure the data to stable media.
Write Ordering: SQL Server is designed to maintain the WAL protocol as described earlier in this paper. The underlying environment plays a pivotal role in the maintenance of the protocol. Write ordering must be maintained. With any system, but specifically for remote mirroring implementations, write ordering is critical to maintain a point-in-time view of the data. Many remote mirroring vendors implement generation logic to maintain physical write order even when the write operations are transmitted between remote caches with alternate protocols.
Torn I/O Prevention (Split I/Os): For SQL Server, the 8-KB blocks need to be handled as a single block of data. Systems that split I/Os must be configured to avoid splitting I/O requests into smaller blocks. Some dynamic disk and volume managers can be configured with block and cluster sizes that are smaller than 8 KB or not on 8-KB boundaries. These systems may split the SQL Server I/O request across physical system components. In doing so, they can introduce the possibility of torn pages and they can destroy true write ordering precedence.

How do I test for these?

The easiest way to test if a disk subsystem (standalone or mirrored) meets the above requirements is to simulate SQL Server I/O patterns using the SQLIOSim utility. This utility actually replaces the SQLIOStress utility described in Bob Dorr’s whitepaper mentioned previously.

Write ordering implementations

There are some reference steps you can check for remote mirroring and SQL Server. I’ll keep adding to these as I come across more Smile

EMC SRDF

For example, this EMC paper talks about using something called Consistency Groups when using their SRDF technology to remote mirror a group of LUNs. Consistency Groups are used by EMC to ensure write-ordering, which as you might have read above, is a very important criteria, especially when you have different sets of LUNs for log and data files.

Note: other manufacturers may have their own equivalent to ensure write ordering. I mention EMC’s solution purely as a sample. It is not to be taken in any way as a recommendation or guarantee of any sort.

Hyper-V Replicas

Hyper-V in Windows Server 2012 offers something called Replicas which are an easy way to provide for DR of a VM. SQL Server is supported in such scenarios – see KB 956893 for details. The scenario is supported as long as the EnableWriteOrderPreservationAcrossDisks flag is set.

TEMPDB and remote mirroring / storage replication

One thing to definitely keep in mind when using remote mirroring is to consider where TEMPDB is placed. If you are replicating the LUNs where TEMPDB data / log files are placed, that is calling for big trouble! A lot of bandwidth on your storage replication channel will be wasted moving these bits across, when in reality the TEMPDB will be recreated on startup of the DR instance.

One good thing here is that SQL server 2012 allows TEMPDB data / log files to be placed on local disks even for clustered instances – this is a big help as it helps avoid including TEMPDB in the remote mirroring setup.

Wrap-up cheat sheet

Here is a summary of the key points to keep in mind when implementing storage replication / remote disk mirroring:

Check with the manufacturer if they have addressed the SQL Server storage requirements, specifically the write-ordering implementation
Test with SQLIOSim well before deployment of the actual SQL instances
Always check if your HBA drivers and SAN firmware / software is up to date
Ensure that TEMPDB is not being replicated using disk mirroring across the storage replication channel

That’s it for now. Hope you enjoyed this post! Please leave your feedback / questions in the comments, or use the ‘Email Author’ link on the top right section of the blog to reach me.

March 22, 2013 by Arvind Shyamsundar

Getting worker thread IDs using DMVs

In SQL 2000, the kpid column in sysprocesses was a convenient way to find out the actual worker thread ID (OS level thread identifier) for a given task. How does one do this using the DMVs?

Here is a quick way:

select R.Session_Id, Th.os_thread_id from sys.dm_exec_requests R
join sys.dm_exec_sessions S on R.session_id = S.session_id
join sys.dm_os_tasks T on R.session_id = T.session_id
join sys.dm_os_workers W on T.worker_address = W.worker_address
join sys.dm_os_threads Th on W.thread_address = Th.thread_address
where S.is_user_process = 1
order by Session_id

The above script will get you the session ID and thread ID for any tasks currently running, and will also filter out any non-user processes. BUT… my hand-written T-SQL looks so crowded and ugly. Let us format it with the T-SQL formatting script I shared a while back on my blog:

SELECT R.Session_Id,
Th.os_thread_id
FROM sys.dm_exec_requests AS R
INNER JOIN
sys.dm_exec_sessions AS S
ON R.session_id = S.session_id
INNER JOIN
sys.dm_os_tasks AS T
ON R.session_id = T.session_id
INNER JOIN
sys.dm_os_workers AS W
ON T.worker_address = W.worker_address
INNER JOIN
sys.dm_os_threads AS Th
ON W.thread_address = Th.thread_address
WHERE S.is_user_process = 1;

Looks a lot easier to read! Now, by switching the last is_user_process flag to 0, you can switch to viewing the OS thread IDs of system processes. In the next post, I will show you how to use this knowledge and information to dig even deeper and view the operation of system threads like the lazy writer. Stay tuned!

January 26, 2010 by Arvind Shyamsundar

Priority boost details – and why it’s not recommended

Some times, we see customer has (accidentally or otherwise) enabled the option ‘boost priority’ for SQL Server worker threads. In general Microsoft does not recommend that you set this option. Why?

First a bit of background. When we set the ‘priority boost’ option using sp_configure what is happening is that after restart the SQL engine will call Win32 API SetPriorityClass() and passes in HIGH_PRIORITY_CLASS (if you are debugger savvy, you can set breakpoints on these APIs and check what is happening – that’s what I did, no source code is required to verify this). From MSDN:

HIGH_PRIORITY_CLASS

0x00000080

Process that performs time-critical tasks that must be executed immediately. The threads of the process preempt the threads of normal or idle priority class processes. An example is the Task List, which must respond quickly when called by the user, regardless of the load on the operating system. Use extreme care when using the high-priority class, because a high-priority class application can use nearly all available CPU time.

It then proceeds to call SetThreadPriority() with priority as THREAD_PRIORITY_HIGHEST. For this combination of Process Priority Class and Thread Priority Level, the base priority level of these worker threads is 15. The only ones higher than this in the hierarchy of the OS are any threads which have process priority class set to REALTIME_PRIORITY_CLASS (which should be a very rare case for any application.) this means that many SQL worker threads are running at a priority level which is close to the highest on the system. Hence, they will tend to be selected frequently by kernel dispatcher to execute on the CPU.

So what is the effect?

There is clear precedent in the support teams of priority boost causing unresponsive servers. Sluggish UI / mouse / keyboard movements are other common symptoms if this setting is interfering with the capability of the OS to give (non-SQL) threads their desired quantum on the CPU. On a cluster, having priority boosted SQL threads can cause other critical threads such as the resource monitor’s IsAlive poll thread to timeout, thereby causing unwanted failover. Therefore we do not recommend to set priority boost to 1, especially in clustered instances.

Reference links:

SetPriorityClass: http://msdn.microsoft.com/en-us/library/ms686219(VS.85).aspx

SetThreadPriority: http://msdn.microsoft.com/en-us/library/ms686277(VS.85).aspx

Effective Base Priority: http://msdn.microsoft.com/en-us/library/ms685100(VS.85).aspx

Windows Priority levels: http://www.microsoft.com/mspress/books/sampchap/4354c.aspx and http://www.microsoft.com/mspress/books/sampchap/4354d.aspx

December 31, 2008 by Arvind Shyamsundar

Some notes on database snapshots

Overview

I recently conducted some tests to double check the exact behavior of database snapshots when:

Snapshots are created on volumes which are much smaller than the actual database size
Metadata-only operations such as DROP TABLE are executed
DDL operations execute on the main database and the snapshot runs out of disk space

Here is my testing methodology and the customary summary of observations.

Scenario

Drive C: contains the original data file for database TestSS. Data file size = 1.95GB
TestSS.dbo.T1 is a table which contains 1GB of data
TestSS_1 is a snapshot which was created on TestSS after the 1GB data insert was completed.

Test Case A: Can we create a snapshot on volume smaller than the database size?

The snapshot TestSS_1 was created on volume T: which is 10MB in size.
Size on disk for snapshot at creation = 192KB
The snapshot creation succeeded without any errors.

If you are curious as to what I mean by ‘size on disk’, please refer to this snapshot:

It is a property of the NTFS sparse file that the actual space occupied on disk is much lesser initially than the ‘Size’ which is displayed for that file in Explorer.

Test Case B: Metadata-only operations and snapshot files

Next, I executed a DROP TABLE T1 on TestSS, which returned immediately without any errors. At this time the snapshot did not have a large ‘on disk size’ – just 1.48MB
At this time the data is accessible through the snapshot just fine. If you do a select count(1) from testSS..T1 it fails (as expected). So the main operation (DROP TABLE) worked just fine.

How is this possible? It turns out that certain operations such as TRUNCATE TABLE, DROP TABLE etc. directly operate on the IAM chain and possibly other internal allocation pages to efficiently ‘clean up’ data and metadata. In such cases the copy-on-write to the snapshot is deferred till such time that the actual data page and extent are touched.

Test Case 3: What effect does failure to write to the snapshot have on the main database?

Next, I reverted the database TestSS to the snapshot TestSS_1 and then dropped the TestSS_1 snapshot. How do you revert a database to a snapshot? You use the RESTORE DATABASE … command like this: RESTORE DATABASE TestSS FROM DATABASE_SNAPSHOT = TestSS_1
I recreated TestSS_1 immediately using the CREATE DATABASE … AS SNAPSHOT …
Next, I executed DELETE T1 to force a logged data operation and it executed for some time before finally returning these errors:

Msg 5128, Level 17, State 2, Line 1

Write to sparse file ‘t:Datatestss_1.ss’ failed due to lack of disk space.

Msg 5128, Level 17, State 2, Line 1

Write to sparse file ‘t:Datatestss_1.ss’ failed due to lack of disk space.

Msg 5128, Level 17, State 2, Line 1

Write to sparse file ‘t:Datatestss_1.ss’ failed due to lack of disk space.

Though, interestingly the main DELETE did not fail; all rows from the table T1 have been deleted. It is by-design that any failures to write to the snapshot are not going to affect the primary operation on the database.

Summary

So in conclusion this is what we can learn from the above. Most of it is logical when you think about it, but it is better to be sure than to guess!

Database snapshots can most definitely be created on volumes which are much smaller than the original database size
Metadata only operations initially do not affect the snapshot. Only when the underlying pages are really ‘cleaned up’ by subsequent system processes, will the copy of the affected pages be moved into the database snapshot’s sparse file
Failures to write to the snapshot will never affect the original database on which the DDL is being executed

Additional Reading

Please also refer to the excellent white paper on Database Snapshots Performance Considerations Under IO Intensive Workloads for some more information on snapshots and their effect on performance.
Paul Randal’s blogs about snapshots.
Good information on snapshot internals is at the SQL CAT Team website.

December 19, 2008 by Arvind Shyamsundar

T-SQL Anti-pattern of the day: ‘all-in-one’ queries

Scenario

A common requirement for enquiry queries on an OLTP database is to have search criteria which are very specific (‘get me details for for OrderID = NNNN’) and also the occasional reports which ask for all the orders (‘get me all the orders, no questions asked’.) Here is a sample from AdventureWorks which illustrates the problem:

CREATE PROCEDURE RptOrder(@OrderID int)
AS
BEGIN
    SELECT *
    FROM Sales.SalesOrderHeader
    WHERE (SalesOrderID = @OrderID OR @OrderID IS NULL)
END

What is the meaning of the underlined predicate in the above WHERE clause? It is actually a ‘special case’ where the developer intends to get back all the rows, regardless of the OrderID. This ‘special case’ is triggered by passing in a value of NULL for the @OrderID parameter.

Problem

So while this construct looks good in theory, it lends itself to very poor performance. Take a look at the 2 cases where this procedure is executed.

Case A: with specific OrderID

EXEC RptOrder 43672

Case B: asking for all records

EXEC RptOrder NULL

The plan, it turns out, is the same for both cases and a scan is used! This is despite a seekable index being present on SalesOrderID column for the SalesOrderHeader table:

The reason the optimizer chooses to scan the SalesOrderHeader (in this case it chooses a non-clustered index scan) is because it has no way to determine at compile and optimization time, as to what the specific value of @OrderID would be. Hence it has no way to ‘fold’ the (@OrderID IS NULL) expression and therefore has no option but to look at all the records.

Workarounds

‘IF-ELSE’ Workaround: The straightforward workaround in simple cases like the one above is to separate out the 2 cases into an IF-ELSE block:

ALTER PROCEDURE RptOrder(@OrderID int)
AS
BEGIN
    IF (@OrderID IS NOT NULL)
    BEGIN
        SELECT *
        FROM Sales.SalesOrderHeader
        WHERE (SalesOrderID = @OrderID)
    END
    ELSE
    BEGIN
        SELECT *
        FROM Sales.SalesOrderHeader
    END
END

Now, the 2 test cases work as expected. Here are the execution plans:

EXEC RptOrder 43672

EXEC RptOrder NULL

Dynamic SQL Workaround: However, as the number of predicates in the WHERE clause increase, and if all those predicates (or most of them) have such ‘catch-all’ handling then the IF – ELSE construct becomes unviable. In those cases, a dynamic SQL construct should be considered. Of course, when dealing with dynamic SQL, we must consider security first, including the possibility of SQL Injection and also the Execution Context of the dynamic SQL statement. But that is a topic for another post. Right now, here is how we could handle something like that:

— NOTE: This code is highly simplified and does not provide for any screening

— or protection against SQL injection!!! Provided as-is, confers no warranties.

ALTER PROCEDURE RptOrder(@OrderID int)
AS
BEGIN
    DECLARE @sDynamicSQL nvarchar(4000)
    SELECT @sDynamicSQL = ‘SELECT * FROM Sales.SalesOrderHeader ‘

    IF (@OrderID IS NOT NULL)
    BEGIN
        SELECT @sDynamicSQL = @sDynamicSQL + ‘ WHERE (SalesOrderID = @OrderID)’
    END

    EXEC sp_executesql @sDynamicSQL, N’@OrderID int’, @OrderID = @OrderID
END

Different Code Paths: The cleanest way of course is to consider having separate procedures for each kind of query. For example we can have a procedure called RptSpecificOrder for the case where we are searching by specific OrderID, and another one called RptAllOrders for the ‘get-me-everything’ case. This does have the advantage of clean isolation, but it does not scale easily when the number of predicates are larger. But is does also have the advantage that if we are querying for specific orders 99% of the time, that code path is simplified and optimized accordingly.

Conclusion

Beware of this T-SQL anti-pattern as it is one of the most common ones we see and it does have a huge (negative) impact on query performance. As you can see, if they are not done with these patterns in mind, application design and reporting requirements can have a detrimental effect on OLTP query execution. Separating reporting and OLTP workloads could be the key to solving these kinds of issues. But if separation is not possible, then clever use of separate code paths and stored procedures could help ensure that the most efficient execution plan is selected for each case. For complex queries, dynamic SQL may offer the simplest way out, but due care has to be taken to ensure that permissions and SQL injection issues are kept in mind when dealing with dynamic SQL statements.

June 14, 2006 by Arvind Shyamsundar

Resources from TechEd sessions

Several attendees have asked for the resources section from our presentations at TechEd India. Enjoy!

Blogs
SQLCAT: http://blogs.msdn.com/sqlcat
WINCAT: http://blogs.msdn.com/wincat
Slava Oks’ blog: http://blogs.msdn.com/slavao

Product Feedback
http://lab.msdn.microsoft.com/productfeedback/

External Links:
NUMA FAQ: http://lse.sourceforge.net/numa/faq

Books:
Inside Microsoft SQL Server 2005: Query Processing and Optimization (Volume 3)
The Guru’s Guide to SQL Server Architecture and Internals (SQL Server 2000)

SQL Server 2005 – Operations
http://www.microsoft.com/technet/prodtechnol/sql/2005/library/operations.mspx

Whitepapers
Troubleshooting Performance Problems in SQL Server 2005 (http://www.microsoft.com/technet/prodtechnol/sql/2005/tsprfprb.mspx)
Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005(http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx)
Statistics Used by the Query Optimizer in SQL Server 2005 (http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx)
SQL Server 2000 I/O Basics (http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlIObasics.mspx)

Webcasts
Troubleshooting Performance Problems In SQL Server 2005 (http://msevents.microsoft.com/cui/WebCastEventDetails.aspx?EventID=1032275646&EventCategory=5&culture=en-US&CountryCode=US)
Performance Diagnosis in SQL Server 2005 (http://msevents.microsoft.com/cui/WebCastEventDetails.aspx?EventID=1032275586&EventCategory=5&culture=en-US&CountryCode=US)
Supportability features for SQL Server 2005
(http://msevents.microsoft.com/cui/WebCastEventDetails.aspx?EventID=1032275593&EventCategory=5&culture=en-US&CountryCode=US)

PASS 2005 PSS Service Center Labs
http://www.microsoft.com/downloads/details.aspx?FamilyID=9b8098a7-e75f-462a-b296-e80199c9f323&displaylang=en

Arvind Shyamsundar's technical blog

Arvind Shyamsundar is a Principal PM @ MSFT Azure Data, working on Azure SQL. Data geek. Apache Accumulo and Fluo PMC. SQL MCM, ex-Principal PFE (MSFT Services). These are my own opinions and not those of Microsoft.

Tag / SQL Architecture