Exotic wait types: PREEMPTIVE_OS_ENCRYPTMESSAGE

Some time ago, we were discussing wait types with a group of customers. To demonstrate the waits-and-queues methodology we use, I showed the group the sys.dm_os_wait_stats DMV, and the results were sorted by wait_time_ms in descending order. On the top of that list was the PREEMPTIVE_OS_ENCRYPTMESSAGE wait type. At that moment I was not exactly sure what that wait type was and I promised to look into it later. This post is a quick explanation of the same.

It turns out that if the SQL client and / or server is configured to use SSL for secure communications, the implementation within SQL Server will use an Win32 API hosted within SspiCli.dll. Such wait states – where the SQLOS calls into an external DLL which SQL has no direct control over – are identified with the PREEMPTIVE_* label.

So in short, this wait type is representative of the time it takes to encrypt TDS traffic back to the client. Does it represent a problem if it is on top of the list? Not necessarily, and as with all wait types, it needs to be carefully examined within the context of the specific issue being investigated and co-related with other observations such as performance counters etc. In certain cases, if you strongly suspect that this is representative of a problem, you can always check if you have configured SSL for SQL Server or for the client, and test with that turned off (subject to the boundary security requirements of course!)

Till next time, have a good one!

Advertisements

Too many files causing RAM pressure? (a.k.a. DynCache to the rescue!)

Readers of this blog might recall a previous post which described one impact of having too many similarly named files in one folder in NTFS. It turns out that the 8.3 naming convention is not the only thing you need to worry about when you have very large amounts of (smaller) files in the same volume.

Today I was called in to assist with a performance issue on a server. The only visible symptom of the problem was excessive RAM utilization on the server. The interesting aspect was that no specific user-mode process was consuming that RAM, so we were wondering where it came from. If this server was running SQL Server, for example, the ‘ghost’ utilization could be due to locked pages (which do not show up in Task Manager) but that was not the case here.

Analysis

So we ran the RAMMap utility, we found that the usage for MetaFile was a substantial percentage of the total RAM usage. From this AskPerf blog post you can see what MetaFile is all about:

“Metafile is part of the system cache and consists of NTFS metadata. NTFS metadata includes the MFT as well as the other various NTFS metadata files (see How NTFS Works for more details, and of course Windows Internals is a great reference). In the MFT each file attribute record takes 1k and each file has at least one attribute record. Add to this the other NTFS metadata files and you can see why the Metafile category can grow quite large on servers with lots of files.”

The next step therefore was to cross-check how large the MFT was in reality. The easy way to do this is to utilize the command given below:

fsutil fsinfo ntfsinfo <drive letter>

A sample output is given below (from my own laptop Smile)

C:>fsutil fsinfo ntfsinfo c:
NTFS Volume Serial Number :       0x8a40c9ee40c9e0d5
NTFS Version   :                  3.1
LFS Version    :                  2.0
Number Sectors :                  0x000000003a2007ff
Total Clusters :                  0x00000000074400ff
Free Clusters  :                  0x0000000000ab1f84
Total Reserved :                  0x000000000002ef10
Bytes Per Sector  :               512
Bytes Per Physical Sector :       4096
Bytes Per Cluster :               4096
Bytes Per FileRecord Segment    : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length :           0x0000000030240000
Mft Start Lcn  :                  0x00000000000c0000
Mft2 Start Lcn :                  0x0000000000000002
Mft Zone Start :                  0x0000000004d58da0
Mft Zone End   :                  0x0000000004d655c0
Resource Manager Identifier :     96CC88FE-5621-11E3-AF31-3C970EA47926

In this output, the “Mft Valid Data Length” gives us an indicator of how many bytes are used by the MFT. In the above case for example it equates to around 770MB:

0x0000000030240000 / (1024*1024) = 770.25 MB

Just as a curiosity, if I run RAMMap in my laptop (Windows 8.1), here is what I see, you can see a rough alignment with the above computed number.

image

FYI, in the server that I was looking at in the real world, the size of the MFT was actually 1.5 times the amount of RAM on the box Smile 

Mitigation

Since the server in question was running Windows Server 2008 R2 SP1, we recommended the usage of the DynCache service sample, which would automatically control the size of the system cache based on system memory notifications.

SQL Server FileStream

Practically, this issue would also apply when you use SQL FileStream to store a very large number of blobs in the NTFS file system. In such cases, here are my recommendations:

  1. Consider setting the SQL Server Database Engine to use ‘locked pages’ with ‘max server memory’ set appropriately.
  2. If the OS is Windows 2008 R2 or below, you may additionally consider using the DynCache service to mitigate the effect that the large MFT will have.

Other notes

If you anticipate huge numbers of files to be stored on the file system, keep in mind that each file record will take up around 1KB. That means 100 million files will take up close to 100GB worth of MFT storage!

On Windows 2008 R2 and below be aware of issues like the one described in KB article 967351 and install the updated version of NTFS.sys accordingly.

In extreme cases, if you want to achieve optimal performance without compromising on system cache memory utilization, be aware that the system RAM sizing must be done accordingly keeping in mind the very large MFT which might result.

References

You may want to read these articles for more information:

Tracking TEMPDB internal object space usage in SQL 2012

It is a documented fact that using the MAX specifier can result in TEMPDB usage in specific cases. From the TEMPDB whitepaper, some of these can use ‘internal object’ space within TEMPDB:

“all of the large object types: text, image, ntext, varchar(max), varbinary(max), and all others.”

Now, a FAQ from our customers is how to drill down to session level usage of TEMPDB when they suspect abnormal growth. Normally, the sys.dm_db_session_space_usage gives us the details we need. But are there any other options to see deeper? We wanted to test this and observe the effects. To do this, we created a simple stored procedure and a corresponding C# console application which passes in a large valued parameter.

Stored procedure code

create procedure testvarcharmax(@s varchar(max))
as
select @@version

C# application code

Here is the simple front-end application code which invokes the ‘testvarcharmax’ procedure:

SqlConnection cn = new SqlConnection("Data Source=.;initial catalog=tempdb;integrated security=true");
cn.Open();

SqlCommand cmd = new SqlCommand();
cmd.Connection = cn;
cmd.CommandType = System.Data.CommandType.StoredProcedure;
cmd.CommandText = "testvarcharmax";
SqlParameter parameter = new SqlParameter("s", System.Data.SqlDbType.VarChar, -1);
parameter.Value = new StringBuilder().Insert(0, "a", 1000000000).ToString();
cmd.Parameters.Add(parameter);
cmd.CommandTimeout = 360;

cmd.ExecuteNonQuery();

Monitoring TEMPDB

After executing the above C# code, my TEMPDB quickly grew by leaps and bounds. As any DBA would understand, the essential ask in such cases is to trace the session ID which is generating this TEMPDB space usage. One option is to use the sys.dm_db_file_space_usage DMV. That one is useful in this case, as it reports the aggregate space for the internal objects:

SELECT SUM(internal_object_reserved_page_count) AS [internal object pages used], (SUM(internal_object_reserved_page_count)*1.0/128) AS [internal object space in MB] FROM tempdb.sys.dm_db_file_space_usage;

However the above does not provide a session-level detail. To obtain that, you can use the queries on sys.dm_db_session_space_usage and sys.dm_db_task_space_usage as shown below:

select * from tempdb.sys.dm_db_session_space_usage
order by internal_objects_alloc_page_count desc

select * from tempdb.sys.dm_db_task_space_usage
order by internal_objects_alloc_page_count desc

Update: Adam Machanic pointed out that it is much easier and lightweight to use the sys.dm_db_task_space_usage and sys.dm_db_session_space_usage DMVs to track internal object usage to a given task / session. Please use the detailed approach described below only if you have a specific need to investigate lower-level details, such as a potential product issue or such.

Digging deeper

Suppose you suspect a ‘leak’ in TEMPDB objects of some sort, and you need to really peer deep into TEMPDB page allocation / deallocation. To peer more into the internals, KB article 947204 provides an interesting approach using the allocation ring buffer using trace flag 1106. This can be useful in rare cases, but first do note the important disclaimer about this flag:

Note We recommend that you do not use trace flag 1106 on production servers because trace flag 1106 affects the performance of SQL Server.

Note that the query specified therein does not return results on SQL 2012 even after enabling trace flag 1106. It turns out that the query requires a minor change, as the XML schema and the name for the ring buffer has changed in SQL 2012. The new name for the ring buffer is RING_BUFFER_SPACEMGR_TRACE. And here is the modified query, reflecting changes in the XML schema as well:

SELECT top 500
record.value(‘(Record/@id)[1]’, ‘int’) AS record_id,
CONVERT (varchar, DATEADD (ms, -1 * ((inf.cpu_ticks / inf.cpu_ticks) – [timestamp]), GETDATE()), 126) AS EventTime,
[timestamp] ,
  record.value(‘(Record/@id)[1]’, ‘int’) AS RingBuffer_Record_Id,
  record.value(‘(Record/SpaceMgr/Event)[1]’, ‘int’) AS AllocationEventType,
  record.value(‘(Record/SpaceMgr/SpId)[1]’, ‘int’) AS SpId,
  record.value(‘(Record/SpaceMgr/EcId)[1]’, ‘int’) AS EcId,
  record.value(‘(Record/SpaceMgr/PageId)[1]’, ‘nvarchar(50)’) AS AllocatedPageId,
  record.value(‘(Record/SpaceMgr/AuId)[1]’, ‘nvarchar(50)’) AS AllocationUnitId,
  record.value(‘(Record/SpaceMgr/LsId)[1]’, ‘nvarchar(50)’) AS LsId
FROM sys.dm_os_sys_info inf CROSS JOIN (
SELECT timestamp, CONVERT (xml, record) AS record
FROM sys.dm_os_ring_buffers
WHERE ring_buffer_type = ‘RING_BUFFER_SPACEMGR_TRACE’
AND ( record LIKE ‘%<Event>23</Event>%’ — uniform extent allocation
OR record LIKE ‘%<Event>22</Event>%’ — uniform extent deallocation
OR record LIKE ‘%<Event>24</Event>%’ — mixed extent allocation
OR record LIKE ‘%<Event>25</Event>%’ — mixed extent deallocation
OR record LIKE ‘%<Event>10</Event>%’ — page allocation
OR record LIKE ‘%<Event>11</Event>%’ — page deallocation
)) AS t
    ORDER BY record.value(‘(Record/@id)[1]’, ‘int’) ASC

The results of the ring buffer query are too detailed for most real-time scenarios and the overheads associated with the trace flag make it impractical, as compared to the sys.dm_db_session_space_usage DMV. But if you happen to use the approach in KB 947204 on SQL 2012 for some really low level debugging, please take note of the above changes.

Reference

Queries useful for troubleshooting TEMPDB space usage

Indexed views with remote tables–possible?

Once upon a time, I had this question come up during an onsite visit: someone wanted to create a ‘materialized’ copy of the remote data locally as an indexed view. I had never heard that requirement before, but since the motivation was genuine, I decided to validate it.

There is a long list of what is permitted (and what is not) in indexed views. One fundamental requirement is that the view on which the index is created, must be schema-bound. And this fundamental requirement, actually nails the answer straight away: any attempt to create a schema-bound view which references a remote table ends up with the SQL error 2014:

Remote access is not allowed from within a schema-bound object

Now that it is clear that such ‘projection’ is not possible, we have to step back and understand the real problems why this requirement was posed. In most cases, such ideas come up for improving performance of distributed queries (DQ). For all such cases, I suggest you first start by reviewing the excellent talk from Conor on DQ.

But in many cases, especially where multiple remote data sources are involved, probably the best way to optimize the queries is to consolidate the data inside a data mart or warehouse, and then possibly consider the columnstore index (or the new Clustered Columnstore in SQL 2014) feature to optimize the aggregation-type queries which are very typical in those scenarios.

SQL Server and Storage / SAN replication (mirroring) tips

Some customers have adopted block level replication of the storage disks (LUNs) as their DR solution for SQL Server databases. Remote mirroring of storage disks sounds great – transparent to the application, replicates every block as-is; so very little reconfiguration required on the other side, scales fairly linearly regardless of the number of databases and so on.  Sounds better than baked bread, huh? Smile However, there are some finer points we would like to share from our experiences on the field.

Support Policy

Our KB articles sometimes read like legal documentation and rightly so – they are in many cases the ‘rulings’ and ‘bye-laws’ by which customers and us reach conclusions in tricky situations. So there are some KB articles which are a must-read in the case of storage.

KB 913945 states that Microsoft does not officially certify specific third party products will work with SQL Server. The most important part of that article is here:

Microsoft support documents may use the phrase "not supported by Microsoft" with regard to third-party products. This phrase means that the first line of support for such a product should be the third-party product vendor rather than Microsoft.

Next we come to KB 910716 which specifically talks about remote mirroring a.k.a. SAN replication. Some excerpts from this document:

  • This article does not endorse any specific vendor’s solution.
  • Work with your hardware vendor and your storage vendor to make sure that the remote mirroring solution that you want to use supports the I/O requirements of SQL Server.
  • If your solution meets the requirements that are discussed in this article, Microsoft Customer Support Services will provide support for issues that are not related to storage devices.
  • To obtain support from Microsoft Customer Support Services for issues that involve transactional throughput, you must be able to reproduce the issue in a mirror-free environment.
  • To obtain support for issues that involve a storage device, see your storage vendor.
  • To obtain support for issues that involve third-party replication technology, see your technology vendor.

Lastly we have KB 943984 which talks about using the storage mirroring to support a so-called geo-cluster. In such cases, remote disk mirroring is used to provide a consistent copy of the clustered storage disks on a remote site. This article has one very important line:

Please contact the data replication vendor for any issues related to accessing data on failover.

By now, you must be feeling very jittery about using such remote mirroring Smile But fear not, there are some basic precautions you can take to be sure.

SQL I/O Requirements

The fundamental issue is that the database and log files are now in two places rather than one. And that means that the basic I/O requirements that SQL Server assumes or requires are to be followed in both places. These requirements are described in the ages-old (but still relevant) SQL I/O Basics Part 1 document. (FYI, the Part 2 document is also a very good read) and are given below:

  • Stable Media: Any system running SQL Server must provide stable media capabilities for the database log and data files. If the system has something like a non-battery backed cache or perhaps caching enabled on disk, it is not safe for SQL Server installations. Ensure that your system has proper caching attributes to secure the data to stable media.
  • Write Ordering: SQL Server is designed to maintain the WAL protocol as described earlier in this paper. The underlying environment plays a pivotal role in the maintenance of the protocol. Write ordering must be maintained. With any system, but specifically for remote mirroring implementations, write ordering is critical to maintain a point-in-time view of the data. Many remote mirroring vendors implement generation logic to maintain physical write order even when the write operations are transmitted between remote caches with alternate protocols.
  • Torn I/O Prevention (Split I/Os): For SQL Server, the 8-KB blocks need to be handled as a single block of data. Systems that split I/Os must be configured to avoid splitting I/O requests into smaller blocks. Some dynamic disk and volume managers can be configured with block and cluster sizes that are smaller than 8 KB or not on 8-KB boundaries. These systems may split the SQL Server I/O request across physical system components. In doing so, they can introduce the possibility of torn pages and they can destroy true write ordering precedence.

 

How do I test for these?

The easiest way to test if a disk subsystem (standalone or mirrored) meets the above requirements is to simulate SQL Server I/O patterns using the SQLIOSim utility. This utility actually replaces the SQLIOStress utility described in Bob Dorr’s whitepaper mentioned previously.

Write ordering implementations

There are some reference steps you can check for remote mirroring and SQL Server. I’ll keep adding to these as I come across more Smile

EMC SRDF

For example, this EMC paper talks about using something called Consistency Groups when using their SRDF technology to remote mirror a group of LUNs. Consistency Groups are used by EMC to ensure write-ordering, which as you might have read above, is a very important criteria, especially when you have different sets of LUNs for log and data files.

Note: other manufacturers may have their own equivalent to ensure write ordering. I mention EMC’s solution purely as a sample. It is not to be taken in any way as a recommendation or guarantee of any sort.

Hyper-V Replicas

Hyper-V in Windows Server 2012 offers something called Replicas which are an easy way to provide for DR of a VM. SQL Server is supported in such scenarios – see KB 956893 for details. The scenario is supported as long as the EnableWriteOrderPreservationAcrossDisks flag is set.

TEMPDB and remote mirroring / storage replication

One thing to definitely keep in mind when using remote mirroring is to consider where TEMPDB is placed. If you are replicating the LUNs where TEMPDB data / log files are placed, that is calling for big trouble! A lot of bandwidth on your storage replication channel will be wasted moving these bits across, when in reality the TEMPDB will be recreated on startup of the DR instance.

One good thing here is that SQL server 2012 allows TEMPDB data / log files to be placed on local disks even for clustered instances – this is a big help as it helps avoid including TEMPDB in the remote mirroring setup.

Wrap-up cheat sheet

Here is a summary of the key points to keep in mind when implementing storage replication / remote disk mirroring:

  1. Check with the manufacturer if they have addressed the SQL Server storage requirements, specifically the write-ordering implementation
  2. Test with SQLIOSim well before deployment of the actual SQL instances
  3. Always check if your HBA drivers and SAN firmware / software is up to date
  4. Ensure that TEMPDB is not being replicated using disk mirroring across the storage replication channel

That’s it for now. Hope you enjoyed this post! Please leave your feedback / questions in the comments, or use the ‘Email Author’ link on the top right section of the blog to reach me.

Getting worker thread IDs using DMVs

In SQL 2000, the kpid column in sysprocesses was a convenient way to find out the actual worker thread ID (OS level thread identifier) for a given task. How does one do this using the DMVs?

Here is a quick way:

select R.Session_Id, Th.os_thread_id from sys.dm_exec_requests R
join sys.dm_exec_sessions S on R.session_id = S.session_id
join sys.dm_os_tasks T on R.session_id = T.session_id
join sys.dm_os_workers W on T.worker_address = W.worker_address
join sys.dm_os_threads Th on W.thread_address = Th.thread_address
where S.is_user_process = 1
order by Session_id

The above script will get you the session ID and thread ID for any tasks currently running, and will also filter out any non-user processes. BUT… my hand-written T-SQL looks so crowded and ugly. Let us format it with the T-SQL formatting script I shared a while back on my blog:

SELECT R.Session_Id,
Th.os_thread_id
FROM   sys.dm_exec_requests AS R
INNER JOIN
sys.dm_exec_sessions AS S
ON R.session_id = S.session_id
INNER JOIN
sys.dm_os_tasks AS T
ON R.session_id = T.session_id
INNER JOIN
sys.dm_os_workers AS W
ON T.worker_address = W.worker_address
INNER JOIN
sys.dm_os_threads AS Th
ON W.thread_address = Th.thread_address
WHERE  S.is_user_process = 1;

Looks a lot easier to read! Now, by switching the last is_user_process flag to 0, you can switch to viewing the OS thread IDs of system processes. In the next post, I will show you how to use this knowledge and information to dig even deeper and view the operation of system threads like the lazy writer. Stay tuned!

Priority boost details – and why it’s not recommended

Some times, we see customer has (accidentally or otherwise) enabled the option ‘boost priority’ for SQL Server worker threads. In general Microsoft does not recommend that you set this option. Why?

First a bit of background. When we set the ‘priority boost’ option using sp_configure what is happening is that after restart the SQL engine will call Win32 API SetPriorityClass() and passes in HIGH_PRIORITY_CLASS (if you are debugger savvy, you can set breakpoints on these APIs and check what is happening – that’s what I did, no source code is required to verify this). From MSDN:

HIGH_PRIORITY_CLASS

0x00000080

Process that performs time-critical tasks that must be executed immediately. The threads of the process preempt the threads of normal or idle priority class processes. An example is the Task List, which must respond quickly when called by the user, regardless of the load on the operating system. Use extreme care when using the high-priority class, because a high-priority class application can use nearly all available CPU time.

It then proceeds to call SetThreadPriority() with priority as THREAD_PRIORITY_HIGHEST. For this combination of Process Priority Class and Thread Priority Level, the base priority level of these worker threads is 15. The only ones higher than this in the hierarchy of the OS are any threads which have process priority class set to REALTIME_PRIORITY_CLASS (which should be a very rare case for any application.) this means that many SQL worker threads are running at a priority level which is close to the highest on the system. Hence, they will tend to be selected frequently by kernel dispatcher to execute on the CPU.

So what is the effect?

There is clear precedent in the support teams of priority boost causing unresponsive servers. Sluggish UI / mouse / keyboard movements are other common symptoms if this setting is interfering with the capability of the OS to give (non-SQL) threads their desired quantum on the CPU. On a cluster, having priority boosted SQL threads can cause other critical threads such as the resource monitor’s IsAlive poll thread to timeout, thereby causing unwanted failover. Therefore we do not recommend to set priority boost to 1, especially in clustered instances.

Reference links:

SetPriorityClass: http://msdn.microsoft.com/en-us/library/ms686219(VS.85).aspx

SetThreadPriority: http://msdn.microsoft.com/en-us/library/ms686277(VS.85).aspx

Effective Base Priority: http://msdn.microsoft.com/en-us/library/ms685100(VS.85).aspx

Windows Priority levels: http://www.microsoft.com/mspress/books/sampchap/4354c.aspx and http://www.microsoft.com/mspress/books/sampchap/4354d.aspx