January 21, 2015 by Arvind Shyamsundar

Exotic wait types: PREEMPTIVE_OS_ENCRYPTMESSAGE

Some time ago, we were discussing wait types with a group of customers. To demonstrate the waits-and-queues methodology we use, I showed the group the sys.dm_os_wait_stats DMV, and the results were sorted by wait_time_ms in descending order. On the top of that list was the PREEMPTIVE_OS_ENCRYPTMESSAGE wait type. At that moment I was not exactly sure what that wait type was and I promised to look into it later. This post is a quick explanation of the same.

It turns out that if the SQL client and / or server is configured to use SSL for secure communications, the implementation within SQL Server will use an Win32 API hosted within SspiCli.dll. Such wait states – where the SQLOS calls into an external DLL which SQL has no direct control over – are identified with the PREEMPTIVE_* label.

So in short, this wait type is representative of the time it takes to encrypt TDS traffic back to the client. Does it represent a problem if it is on top of the list? Not necessarily, and as with all wait types, it needs to be carefully examined within the context of the specific issue being investigated and co-related with other observations such as performance counters etc. In certain cases, if you strongly suspect that this is representative of a problem, you can always check if you have configured SSL for SQL Server or for the client, and test with that turned off (subject to the boundary security requirements of course!)

Till next time, have a good one!

January 12, 2015 by Arvind Shyamsundar

Error message “Copying cluster-resource …fssres.dll to C:Windowssystem32fssres.dll failed…” during SQL Server 2014 setup

First and foremost, let me reproduce the complete error message I refer to above: “Copying cluster-resource C:Program FilesMicrosoft SQL ServerMSSQL12.SQL14MSSQLBinnfssres.dll to C:Windowssystem32fssres.dll failed because the destination file exists and is read-only”

Investigation

I received the above error message while setting up a SQL Server 2014 instance side by side with an existing copy of SQL Server 2012. The setup caused the “Database Engine” component to fail and later it had to be removed before re-attempting setup.

The root cause of such setup failures is often quite easy to locate when you look at the setup logs, which are conveniently placed under the “C:Program FilesMicrosoft SQL Server120Setup BootstrapLog” folder on your machine. When I looked at the Detail.txt file from such the problematic server, I could sport the following section:

(01) 2015-01-04 08:09:34 Slp: UpdateClusterResourceAction: Resource type = ‘SQL Server FILESTREAM Share’ ResourceDllName = fssres.dll Source Location = ‘C:Program FilesMicrosoft SQL ServerMSSQL12.SQL14MSSQLBinn’ Target Location = ‘C:Windowssystem32’.
(01) 2015-01-04 08:09:40 Slp: Type ‘SQL Server FILESTREAM Share’ not found. Performing copy directly …
.…
(01) 2015-01-04 08:09:40 Slp: Failed to version-copy file ‘C:Program FilesMicrosoft SQL ServerMSSQL12.SQL14MSSQLBinnfssres.dll’ to ‘C:Windowssystem32fssres.dll’. Exception data is: System.IO.IOException: The process cannot access the file ‘C:Windowssystem32fssres.dll’ because it is being used by another process.
   at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   at System.IO.File.InternalCopy(String sourceFileName, String destFileName, Boolean overwrite)
   at Microsoft.SqlServer.Configuration.Cluster.UpdateClusterResourceAction.VersionCopy(String source, String target).
….
Microsoft.SqlServer.Configuration.Cluster.UpdateClusterResourceException: Copying cluster-resource C:Program FilesMicrosoft SQL ServerMSSQL12.SQL14MSSQLBinnfssres.dll to C:Windowssystem32fssres.dll failed because the destination file exists and is read-only. —> System.IO.IOException: The process cannot access the file ‘C:Windowssystem32fssres.dll’ because it is being used by another process.
….
Microsoft.SqlServer.Configuration.Cluster.UpdateClusterResourceAction.VersionCopy(String source, String target)

From the above, it is quite clear that the FSSRES.DLL already exists previously. Now, when I checked the file version it was clear that the existing version is of version SQL Server 2012:

On executing a TASKLIST /M fssres.dll command it quickly became clear that the process which has this DLL already loaded is the RHS.exe for the existing instance of SQL 2012. That instance had an Availability Group already configured hence the RHS.exe was loading the DLL.

Given that the DLL was already loaded by the RHS.exe, there was no way for SQL setup to update it. That is why the SQL 2014 instance setup failed.

Workaround

The workaround I had to use in the above case was to firstly remove the SQL 2014 Database Engine instance; shutdown the cluster service on the failed node and then re-attempt the SQL 2014 setup. Less than ideal, but till the above issue is handled in the setup, we have very little options. I have logged also a Connect item requesting the Product team to investigate this issue. If you think this issue should be fixed, please vote it up on Connect!

May 13, 2014 by Arvind Shyamsundar

Exotic spinlocks: X_PACKET_LIST and XID_ARRAY

A Twitter conversation sparked this blog post; the question there was about the relevance of the X_PACKET_LIST spinlock. As with most spinlocks, there is no public documentation on this one, and that is primarily because the spinlock is an implementation detail and can change from release to release.

‘Do it yourself’

That said, we can uncover some basic details about this spinlock. As preliminary reading, I suggest that you read Craig Freedman’s post on the way parallelism is implemented in SQL Server. Next, it may help if you can get handy with a debugger and SQL Server public symbols, something on the lines of this SQLCAT blog post. Another useful background read is Paul White’s post where he alludes to some of the architecture elements behind parallelism.

Now, I am going to simplify things within the limits of what we can disclose publicly. FYI, the information I share below can be obtained by anyone with a little bit of WinDbg magic and co-relating to the sys.dm_xe_map_values DMV. For example, if you look at the call stack in the SQLCAT team blog post and later poke around in the debugger with some breakpoints and the x (Examine Symbols) command, you can deduce the following:

An exchange (parallelism) operation has CXPipe instances. Conceptually think of these as the ‘wiring’ between producers and consumers within a parallel exchange (see sqlmin!CXPipe::NewRow and sqlmin!CXPipe::GetRow in the debugger)
These pipes have a CXPacketMgr class which in turn has a CXPacketList. From the name it suggests that this is a data structure into which producers insert (sqlmin!CXPacketList::Append in the debugger) and consumers pull packets from (sqlmin!CXPacketList::RemoveHead in the debugger.)
Any shared data structure must be protected for this kind of concurrent access, and the way we chose to do it in SQL is to use a spinlock – the X_PACKET_LIST spinlock.

Connecting the dots

Now, in Christian’s case (from the Twitter thread) he was also observing relatively high spins and backoffs for XID_ARRAY spinlock. That one is also related to the parallelism implementation. To explain this, there are two other classes of interest: a CXPort class and a CXId (Exchange ID for each worker). For simplicity, think of the CXPort having entries for the CXIds (representing worker threads) in the form of an array. Access to that array is protected using the XID_ARRAY spinlock.

Conclusion

So in conclusion, both these spinlocks have everything to do with parallelism. Experimenting with different values for MAXDOP may be beneficial, and also check if there are excessive amounts of data (LOB data maybe) flowing in the exchange. Beyond that, if you are in doubt or clueless, please do not hesitate to contact Microsoft Customer Support Services (CSS) and log an incident to debug the root cause.

January 23, 2014 by Arvind Shyamsundar

“Could not add cluster access…” during SQL 2012 Failover Cluster Instance setup

Today a lab setup of SQL 2012 SP1 + CU7 (I was using the /UpdateSource flag to have a patched setup + roll in the updated bits in one go) failed sometime before creating the SQL resource in cluster. The error message from the detail.txt file showed:

Could not add cluster access for SID ‘S-1-5-80-2358576813-3362504073-1364548504-537691503-200193738’. Error: There was a failure to call cluster code from a provider. Exception message: The specified path is invalid.

Preliminary Analysis

The GUID (S-1-5-80-2358576813-3362504073-1364548504-537691503-200193738) in this case happened to be the SQL Service SID. How did I know that? You can also do it, if you use the SC SHOWSID command:

C:Windowssystem32>sc showsid mssql$inst1

NAME: mssql$inst1
SERVICE SID: S-1-5-80-2358576813-3362504073-1364548504-537691503-200193738

This issue occurs when the setup routine fails to add the service account’s SID to the list of accounts allowed to access the Windows cluster itself. For example, in my test cluster (in a healthy state, of course) you can see the Service SID is added to the list of authorized accounts in the security descriptor of the Windows cluster itself:

But of course in my failed setup attempt, the SID was not added correctly. The big question is why?

(Another side question some of you may have is why should the service account have this permission on the cluster itself. Well, that’s another topic and another blog post – stay tuned for that!)

Digging deep: Cluster Log

Carrying on from the previous section, one helpful hint is to know that when such changes are made to the cluster’s ACL configuration, the changes are persisted to the cluster database, a copy of which is also persisted to the quorum disk resource (if one was configured.)

Now in my case, deeper troubleshooting was required to find out why the security descriptor could not be written. To find out, I dumped the cluster log (using the PowerShell cmdlet Get-ClusterLog) and found the following messages at the same time that the SQL setup failed:

00001004.00000fe0::2014/01/23-07:20:05.486 INFO [RES] Physical Disk <RealWitness>: Path W:Cluster is not on the disk
00001004.00000fe0::2014/01/23-07:20:05.486 ERR [RHS] Error 161 from ResourceControl for resource RealWitness.
00000fb8.000011fc::2014/01/23-07:20:05.486 WARN [RCM] ResourceControl(STORAGE_IS_PATH_VALID) to RealWitness returned 161.
00000fb8.000011fc::2014/01/23-07:20:05.585 ERR [RCM] rcm::RcmApi::SetQuorumResource: ERROR_BAD_PATHNAME(161)’ because of ‘ValidateQuorumPath( pRes, quorumPath )’

W:Cluster did not make sense initially, because this drive was supposed to be unused by this instance of SQL. I had to think of sequence of operations I had done during this setup…

Root cause found

Looking back, I remembered that I had swapped the disk quorum resource somewhere halfway in between the setup. It was earlier on the W: drive, but now on the Q: drive, which has the resource name RealWitness. So in short, the resource name for the quorum disk was correct, but the drive letter and path on that disk was incorrect. Stale information was used when the setup program tried to update the cluster configuration (which in turn would write to the quorum disk).

Important: Please note that this is the specific root cause for my specific situation. There may be other types of problems when the setup program tries to set the cluster access for the SID. One has to look at the inner exception message (which in my case was ‘specified path is invalid’) to be sure. The true root cause for other cases can typically be found by correlating to the cluster log file.

Solution

The solution, in this case was to remove the SQL bits (from the Control Panel) and nothing else but to re-run the setup program without any changes. It was a cleaner way, and the ‘moral of the story’ is never tweak cluster quorum in the middle of a SQL Setup procedure!

July 24, 2013 by Arvind Shyamsundar

Data Quality Services (DQS) and Failover Cluster instances

One of my customers made the observation that it is not possible to uncheck the Data Quality Services component in the Feature Selection screen in SQL Server 2012 setup. The interesting thing was that she clearly recalled that earlier it was possible to select it (the default was unchecked), and we wondered what changed that it now mandated the selection of the DQS installation:

FYI, she was now running SQL Server 2012 with the Product Update enabled using the command line switches /Action=InstallFailoverCluster /UpdateSource=<path to SP1 EXE>. This means the setup support files are those from SP1.

Analysis

It then came to our attention that in CU1 there was a fix released to ensure DQS compatibility with failover cluster. The issue is documented in KB article 2674817.

Now as per the KB, there is no way to apply the fix (other than a re-install of the FCI Sad smile ) to an already-installed FCI which had the DQS components previously selected (but which were otherwise not working due to the issue described in the KB.) So to prevent users from running into the same problem if they were ever to use DQS after the FCI installation, the product team decided that post CU1 the selection of DQS is mandatory.

The other angle on this is, what if you are never planning to use DQS, is this selection going to add services, databases or files which are unused? The good news is that the impact is not much; the selection of DQS in the above feature list just drops a DQSInstaller.exe into the SQL BINN folder, which then has to be (optionally) executed to actually complete the DQS server installation. So, if you never run the DQSInstaller.exe, it will not create the DQS databases on the FCI.

Conclusion

To summarize, though the feature cannot be unchecked, in reality it does not add too much to the footprint of the installed FCI. Hope this helps!

October 25, 2012 by Arvind Shyamsundar

Aviation BI Project – Part 1: Requirements

Background

One of my hobbies is following the commercial aviation scene. My job requires me to fly around a lot, and that only sparks my curiosity further around how the commercial aviation business works. What better then, that to use this hobby as a motivation to test new features in SQL Server 2012? This hybrid interest, which I call Aviation BI, is the driver behind a series of blog posts, where I will take you through a typical lifecycle of a BI project:

Defining requirements
Solution design and technology selection
Cleansing and importing data into the data warehouse
Querying the data warehouse
Using a multidimensional / tabular model to obtain slice / dice insights
Visualizing those insights in eye-catching fashion

In the first blog post of this series, I will take you through what I (as an end user) see as possible requirements. We will also review the datasets available for consumption.

Requirements

There are some categories of requirements that we have, depending on the perspective we are talking about. A typical retail consumer would want to check some statistics:

Given a particular sector, which airlines serve that sector?
How is the on-time performance of each airline on the sector?
Which airline has been providing the cheapest tickets on the sector?
What is the average arrival delay for a particular airport? Is there a pattern based on time of day?
What is the average load factor (how full is the aircraft) on the sector?

As an airline executive, I might want to look at:

Which are the routes with the highest traffic?
Which aircraft am I routinely having service difficulties with?

As an airport administrator, I might want to look at:

For my airport, is the ramp time (the time taken to taxi from gate to runway) very high compared to other airports?
What are the most common locations from which international passengers arrive to my airport?
Which is the busiest time for arrivals and departures for my airport?

As an aviation analyst I may have the following queries:

Which type of aircraft is used to make trans-Atlantic journeys?
Which operators are still using DC-10 aircraft (DC-10s were largely phased out by major operators in the late 1990s)
Did the Antonov 225 make any appearances in the US recently?
What are the longest non-stop flights to the USA?

And so on… these perspectives are imaginary, but I suppose they are very typical of what people in those roles would be asking to see.

Data sets available

There are some amazing datasets available in the public domain. For our purposes, we will be using the following:

Data set	Data Points	Row count
BTS On-time Performance	Origin, Destination airports Departure delay Arrival delay Taxi times Flight distance Flight time Data available: 1987 onwards Granularity: flight	146,090,545
Airline Origin and Destination Survey (DB1B) Market	Number of coupons for this market¹ Origin, destination airports Ticketing and operating carrier Market Fare Number of passengers Data available: 1993 onwards Granularity: individual ‘market’	346,513,372
Airline Origin and Destination Survey (DB1B) Ticket	Number of coupons in itinerary Origin airport Round trip indicator Miles flown Ticket fare per person Reporting carrier Data available: 1993 onwards Granularity: individual ticket	190,783,532
Airline Origin and Destination Survey (DB1B) Coupon	Origin, destination airports Number of passengers included Fare class Data available: 1993 onwards Granularity: individual coupon	584,874,480
T-100 Segment	Unique carrier code Origin, destination airport Aircraft type Departures scheduled and performed Load factor Seats vs. passengers Freight and mail Distance Ramp to ramp time Air time Data available: 1990 onwards Granularity: aggregated flights (monthly)	7,347,563
T-100 Market ²	Unique carrier code Origin, destination airport Service class Domestic / International indicator Passengers Freight and mail Distance Data available: 1990 onwards Granularity:	5,994,306
FAA Service Difficulty Reporting (SDR)	Data available: 1994 onwards Granularity: each service difficulty report

¹ A ‘market’ in DB1B terminology is a break in journey other than for changing planes.

²The difference between ‘market’ and ‘segment’ in T-100 is explained at this link. The T-100 reporting guide is also useful to understand this dataset.

Next Steps

Over a series of blog posts, I will take you through how we:

Cleanse and import this data into a data warehouse.
Build some analysis services databases on top of the relational data. We will explore both multidimensional and tabular mode databases for this step and see where each fits in.
We will then consume this data – through Reporting Services, Excel and PowerView clients.
We will finally do an incremental loading of ‘new’ data into the warehouse, and follow the incremental processing of the Analysis Services databases and confirm if the reporting pieces refresh the new data.

All of this, towards answering the ‘customer’ requirements which we laid out earlier!

References

For those interested in reading further about airline statistics, here are some references:

So if you have some comments on the scope of this ‘project’, I would LOVE to hear from you! Do leave a comment below if you liked this idea and would like to see more.

Arvind Shyamsundar's technical blog

Arvind Shyamsundar is a Principal PM @ MSFT Azure Data, working on Azure SQL. Data geek. Apache Accumulo and Fluo PMC. SQL MCM, ex-Principal PFE (MSFT Services). These are my own opinions and not those of Microsoft.

Tag / SQL 2012

Exotic wait types: PREEMPTIVE_OS_ENCRYPTMESSAGE

Error message “Copying cluster-resource …fssres.dll to C:Windowssystem32fssres.dll failed…” during SQL Server 2014 setup

Investigation

Workaround

Exotic spinlocks: X_PACKET_LIST and XID_ARRAY

‘Do it yourself’

Connecting the dots

Conclusion

“Could not add cluster access…” during SQL 2012 Failover Cluster Instance setup

Preliminary Analysis

Digging deep: Cluster Log

Root cause found

Solution

Data Quality Services (DQS) and Failover Cluster instances

Analysis

Conclusion

Aviation BI Project – Part 1: Requirements

Background

Requirements

Data sets available

Next Steps

References