Exotic spinlocks: X_PACKET_LIST and XID_ARRAY

A Twitter conversation sparked this blog post; the question there was about the relevance of the X_PACKET_LIST spinlock. As with most spinlocks, there is no public documentation on this one, and that is primarily because the spinlock is an implementation detail and can change from release to release.

‘Do it yourself’

That said, we can uncover some basic details about this spinlock. As preliminary reading, I suggest that you read Craig Freedman’s post on the way parallelism is implemented in SQL Server. Next, it may help if you can get handy with a debugger and SQL Server public symbols, something on the lines of this SQLCAT blog post. Another useful background read is Paul White’s post where he alludes to some of the architecture elements behind parallelism.

Now, I am going to simplify things within the limits of what we can disclose publicly. FYI, the information I share below can be obtained by anyone with a little bit of WinDbg magic and co-relating to the sys.dm_xe_map_values DMV. For example, if you look at the call stack in the SQLCAT team blog post and later poke around in the debugger with some breakpoints and the x (Examine Symbols) command, you can deduce the following:

  • An exchange (parallelism) operation has CXPipe instances. Conceptually think of these as the ‘wiring’ between producers and consumers within a parallel exchange (see sqlmin!CXPipe::NewRow and sqlmin!CXPipe::GetRow in the debugger)
  • These pipes have a CXPacketMgr class which in turn has a CXPacketList. From the name it suggests that this is a data structure into which producers insert (sqlmin!CXPacketList::Append in the debugger) and consumers pull packets from (sqlmin!CXPacketList::RemoveHead in the debugger.)
  • Any shared data structure must be protected for this kind of concurrent access, and the way we chose to do it in SQL is to use a spinlock – the X_PACKET_LIST spinlock.

Connecting the dots

Now, in Christian’s case (from the Twitter thread) he was also observing relatively high spins and backoffs for XID_ARRAY spinlock. That one is also related to the parallelism implementation. To explain this, there are two other classes of interest: a CXPort class and a CXId (Exchange ID for each worker). For simplicity, think of the CXPort having entries for the CXIds (representing worker threads) in the form of an array. Access to that array is protected using the XID_ARRAY spinlock.

Conclusion

So in conclusion, both these spinlocks have everything to do with parallelism. Experimenting with different values for MAXDOP may be beneficial, and also check if there are excessive amounts of data (LOB data maybe) flowing in the exchange. Beyond that, if you are in doubt or clueless, please do not hesitate to contact Microsoft Customer Support Services (CSS) and log an incident to debug the root cause.

SQL 2012 Setup Rules – The ‘Missing Reference’

Troubleshooting SQL Server setup can be a ‘dark science’ at times. There is overall documentation on where to look, and some interesting articles on the Support website. But other times, you might be left clueless on the root cause of failures, or why you are receiving a specific error message.

Situation

For example, one of my customers was adding a node (running Windows 2012 and SQL 2012 SP1) to an existing cluster. They were receiving a specific message when adding the node: “You have selected a feature that is not supported on Windows Server Core. The supported features on Windows Server Core are: Database Engine Services, SQL Server Replication, Full-Text and Semantic Extractions for Search, Analysis Services, Client Tools Connectivity, Integration Services, and SQL Client Connectivity SDK.”

With a bit of probing we understood what was going on: the second node (this was part of a test setup) was setup as a Windows Core setup (no GUI) while the first node was a full edition. Now, we do NOT recommend such mixed setups but given that this was a test setup, we wanted to unblock them and skip the rule. But which rule? There is no documentation on the rule name (which is what the SkipRules parameter takes.)

The Ask

With a bit of looking in the log files, we found that the relevant rule name is ServerCoreBlockUnsupportedFeaturesCheck. But the question from my customer was two-fold:

  • Can we get a list of all the rules, proactively so that we can ensure some basic things are out of the way before we even start running setup?
  • In the worst case scenario, if we do need to skip a rule, which rule name should that be?

It turns out that in SQL 2008 R2, we have a good list of all the rules, neatly categorized into their groups. However, we do not have an updated version of that for SQL Server 2012.

The Answer

If we look at the C:Program Files (x86)Microsoft SQL Server110Setup BootstrapLog<timestamp>DatastorePackage.xml file you will find that it is a rich source of information on all the rules and their rule groups which would run during setup. Not all rules are always applicable; some are specific to scenarios like Failover Cluster installation, Upgrade etc.

But that in itself is not enough. For looking at more descriptive information on the rules, you have to look at the SystemConfigurationCheck_Report.htm file described in KB article 955396. The rule ‘name’ mentioned therein would be the one you would, for example, pass to /SkipRules.

I’ve also listed all the rules evaluated in SQL 2012 in an Excel sheet – I hope that this list will make up for the lack of official documentation (SQL 2008 R2 was the last time this kind of list was made available as I mentioned previously.) The spreadsheet is quite self-explanatory, and I’ve included a screen shot of an extract here:

image

Hope this helps you somehow! If you can share your experiences or have additional questions please do let me know!

Too many files causing RAM pressure? (a.k.a. DynCache to the rescue!)

Readers of this blog might recall a previous post which described one impact of having too many similarly named files in one folder in NTFS. It turns out that the 8.3 naming convention is not the only thing you need to worry about when you have very large amounts of (smaller) files in the same volume.

Today I was called in to assist with a performance issue on a server. The only visible symptom of the problem was excessive RAM utilization on the server. The interesting aspect was that no specific user-mode process was consuming that RAM, so we were wondering where it came from. If this server was running SQL Server, for example, the ‘ghost’ utilization could be due to locked pages (which do not show up in Task Manager) but that was not the case here.

Analysis

So we ran the RAMMap utility, we found that the usage for MetaFile was a substantial percentage of the total RAM usage. From this AskPerf blog post you can see what MetaFile is all about:

“Metafile is part of the system cache and consists of NTFS metadata. NTFS metadata includes the MFT as well as the other various NTFS metadata files (see How NTFS Works for more details, and of course Windows Internals is a great reference). In the MFT each file attribute record takes 1k and each file has at least one attribute record. Add to this the other NTFS metadata files and you can see why the Metafile category can grow quite large on servers with lots of files.”

The next step therefore was to cross-check how large the MFT was in reality. The easy way to do this is to utilize the command given below:

fsutil fsinfo ntfsinfo <drive letter>

A sample output is given below (from my own laptop Smile)

C:>fsutil fsinfo ntfsinfo c:
NTFS Volume Serial Number :       0x8a40c9ee40c9e0d5
NTFS Version   :                  3.1
LFS Version    :                  2.0
Number Sectors :                  0x000000003a2007ff
Total Clusters :                  0x00000000074400ff
Free Clusters  :                  0x0000000000ab1f84
Total Reserved :                  0x000000000002ef10
Bytes Per Sector  :               512
Bytes Per Physical Sector :       4096
Bytes Per Cluster :               4096
Bytes Per FileRecord Segment    : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length :           0x0000000030240000
Mft Start Lcn  :                  0x00000000000c0000
Mft2 Start Lcn :                  0x0000000000000002
Mft Zone Start :                  0x0000000004d58da0
Mft Zone End   :                  0x0000000004d655c0
Resource Manager Identifier :     96CC88FE-5621-11E3-AF31-3C970EA47926

In this output, the “Mft Valid Data Length” gives us an indicator of how many bytes are used by the MFT. In the above case for example it equates to around 770MB:

0x0000000030240000 / (1024*1024) = 770.25 MB

Just as a curiosity, if I run RAMMap in my laptop (Windows 8.1), here is what I see, you can see a rough alignment with the above computed number.

image

FYI, in the server that I was looking at in the real world, the size of the MFT was actually 1.5 times the amount of RAM on the box Smile

Mitigation

Since the server in question was running Windows Server 2008 R2 SP1, we recommended the usage of the DynCache service sample, which would automatically control the size of the system cache based on system memory notifications.

SQL Server FileStream

Practically, this issue would also apply when you use SQL FileStream to store a very large number of blobs in the NTFS file system. In such cases, here are my recommendations:

  1. Consider setting the SQL Server Database Engine to use ‘locked pages’ with ‘max server memory’ set appropriately.
  2. If the OS is Windows 2008 R2 or below, you may additionally consider using the DynCache service to mitigate the effect that the large MFT will have.

Other notes

If you anticipate huge numbers of files to be stored on the file system, keep in mind that each file record will take up around 1KB. That means 100 million files will take up close to 100GB worth of MFT storage!

On Windows 2008 R2 and below be aware of issues like the one described in KB article 967351 and install the updated version of NTFS.sys accordingly.

In extreme cases, if you want to achieve optimal performance without compromising on system cache memory utilization, be aware that the system RAM sizing must be done accordingly keeping in mind the very large MFT which might result.

References

You may want to read these articles for more information:

Database size limitation in SQL Express

I would generally not blog about this kind of topic, except that my customer recently talked to me about it. According to his tests, he seemed to find that the total size limitation across all databases was 10GB. That did not sound correct to me, because it is quite clearly documented that the 10GB limit is per database.

I quickly tested and found out that the documentation is correct. It is only when an individual database is attempted to be created with a data file beyond the size limit of 10GB, that you get the error:

Msg 1827, Level 16, State 4, Server SOMECOMPUTERSQLEXPRESS, Line 1
CREATE DATABASE or ALTER DATABASE failed because the resulting cumulative database size would exceed your licensed limit of 10240 MB per database.

To summarize:

  1. 10GB is the per-database, data-file size limit
  2. If you have more than one data file in your SQL Express database, the size limit is applied to the total size across all such data files
  3. Log files are not subject to this limit

Tracking TEMPDB internal object space usage in SQL 2012

It is a documented fact that using the MAX specifier can result in TEMPDB usage in specific cases. From the TEMPDB whitepaper, some of these can use ‘internal object’ space within TEMPDB:

“all of the large object types: text, image, ntext, varchar(max), varbinary(max), and all others.”

Now, a FAQ from our customers is how to drill down to session level usage of TEMPDB when they suspect abnormal growth. Normally, the sys.dm_db_session_space_usage gives us the details we need. But are there any other options to see deeper? We wanted to test this and observe the effects. To do this, we created a simple stored procedure and a corresponding C# console application which passes in a large valued parameter.

Stored procedure code

create procedure testvarcharmax(@s varchar(max))
as
select @@version

C# application code

Here is the simple front-end application code which invokes the ‘testvarcharmax’ procedure:

SqlConnection cn = new SqlConnection("Data Source=.;initial catalog=tempdb;integrated security=true");
cn.Open();

SqlCommand cmd = new SqlCommand();
cmd.Connection = cn;
cmd.CommandType = System.Data.CommandType.StoredProcedure;
cmd.CommandText = "testvarcharmax";
SqlParameter parameter = new SqlParameter("s", System.Data.SqlDbType.VarChar, -1);
parameter.Value = new StringBuilder().Insert(0, "a", 1000000000).ToString();
cmd.Parameters.Add(parameter);
cmd.CommandTimeout = 360;

cmd.ExecuteNonQuery();

Monitoring TEMPDB

After executing the above C# code, my TEMPDB quickly grew by leaps and bounds. As any DBA would understand, the essential ask in such cases is to trace the session ID which is generating this TEMPDB space usage. One option is to use the sys.dm_db_file_space_usage DMV. That one is useful in this case, as it reports the aggregate space for the internal objects:

SELECT SUM(internal_object_reserved_page_count) AS [internal object pages used], (SUM(internal_object_reserved_page_count)*1.0/128) AS [internal object space in MB] FROM tempdb.sys.dm_db_file_space_usage;

However the above does not provide a session-level detail. To obtain that, you can use the queries on sys.dm_db_session_space_usage and sys.dm_db_task_space_usage as shown below:

select * from tempdb.sys.dm_db_session_space_usage
order by internal_objects_alloc_page_count desc

select * from tempdb.sys.dm_db_task_space_usage
order by internal_objects_alloc_page_count desc

Update: Adam Machanic pointed out that it is much easier and lightweight to use the sys.dm_db_task_space_usage and sys.dm_db_session_space_usage DMVs to track internal object usage to a given task / session. Please use the detailed approach described below only if you have a specific need to investigate lower-level details, such as a potential product issue or such.

Digging deeper

Suppose you suspect a ‘leak’ in TEMPDB objects of some sort, and you need to really peer deep into TEMPDB page allocation / deallocation. To peer more into the internals, KB article 947204 provides an interesting approach using the allocation ring buffer using trace flag 1106. This can be useful in rare cases, but first do note the important disclaimer about this flag:

Note We recommend that you do not use trace flag 1106 on production servers because trace flag 1106 affects the performance of SQL Server.

Note that the query specified therein does not return results on SQL 2012 even after enabling trace flag 1106. It turns out that the query requires a minor change, as the XML schema and the name for the ring buffer has changed in SQL 2012. The new name for the ring buffer is RING_BUFFER_SPACEMGR_TRACE. And here is the modified query, reflecting changes in the XML schema as well:

SELECT top 500
record.value(‘(Record/@id)[1]’, ‘int’) AS record_id,
CONVERT (varchar, DATEADD (ms, -1 * ((inf.cpu_ticks / inf.cpu_ticks) – [timestamp]), GETDATE()), 126) AS EventTime,
[timestamp] ,
  record.value(‘(Record/@id)[1]’, ‘int’) AS RingBuffer_Record_Id,
  record.value(‘(Record/SpaceMgr/Event)[1]’, ‘int’) AS AllocationEventType,
  record.value(‘(Record/SpaceMgr/SpId)[1]’, ‘int’) AS SpId,
  record.value(‘(Record/SpaceMgr/EcId)[1]’, ‘int’) AS EcId,
  record.value(‘(Record/SpaceMgr/PageId)[1]’, ‘nvarchar(50)’) AS AllocatedPageId,
  record.value(‘(Record/SpaceMgr/AuId)[1]’, ‘nvarchar(50)’) AS AllocationUnitId,
  record.value(‘(Record/SpaceMgr/LsId)[1]’, ‘nvarchar(50)’) AS LsId
FROM sys.dm_os_sys_info inf CROSS JOIN (
SELECT timestamp, CONVERT (xml, record) AS record
FROM sys.dm_os_ring_buffers
WHERE ring_buffer_type = ‘RING_BUFFER_SPACEMGR_TRACE’
AND ( record LIKE ‘%<Event>23</Event>%’ — uniform extent allocation
OR record LIKE ‘%<Event>22</Event>%’ — uniform extent deallocation
OR record LIKE ‘%<Event>24</Event>%’ — mixed extent allocation
OR record LIKE ‘%<Event>25</Event>%’ — mixed extent deallocation
OR record LIKE ‘%<Event>10</Event>%’ — page allocation
OR record LIKE ‘%<Event>11</Event>%’ — page deallocation
)) AS t
    ORDER BY record.value(‘(Record/@id)[1]’, ‘int’) ASC

The results of the ring buffer query are too detailed for most real-time scenarios and the overheads associated with the trace flag make it impractical, as compared to the sys.dm_db_session_space_usage DMV. But if you happen to use the approach in KB 947204 on SQL 2012 for some really low level debugging, please take note of the above changes.

Reference

Queries useful for troubleshooting TEMPDB space usage

Managed Service Accounts (MSA) and SQL 2012: Practical Tips

One of the most common dilemmas for SQL Server administrators is whether they should use AD based domain users as the service accounts, or can they leverage the inbuilt accounts like Network Service etc. If your SQL Server instance is never going to participate in any cross-server contexts (such as availability Groups, Linked Servers, Log Shipping etc.) then you may very well be happy to use one of the inbuilt accounts or even better, in SQL 2012, the special ‘Virtual Account’ feature.

While these options relieve you from the overhead of periodically changing and syncing the service account password, they do impose the machine boundary and cause problems when you want to ‘jump’ across instances. So the classic solution has been to grit one’s teeth and ask for a domain user which will then be configured as the service account. However this brings the hassle of periodic password maintenance, and more importantly that causes downtime.

OR you can configure the SQL 2012 standalone instance to utilize the new Managed Service Accounts feature in Windows 2008 R2 and above. To do this, you follow the steps below.

Setup the MSA in Active Directory

First, create a new MSA in the AD using the PowerShell cmdlet. To do this, there are some simple considerations documented here, but the most important ones are to be an domain administrator, to be on Windows 2008 R2 or above and have the right PowerShell modules installed (Remote Server Administration Tools has a AD PowerShell module which you must install.)

Import-Module ActiveDirectory

New-ADServiceAccount –Name TestSQLMSA -Enabled $true

Next, associate the above MSA with the computer you wish to use it on. Note that a MSA in itself can only be used on one destination computer at a time. In this case, I will specify my lab computer named W2K8R2CN3.

Add-ADComputerServiceAccount -Identity W2K8R2CN3 -ServiceAccount TestSQLMSA

Install the MSA on the target server

Once this is done, you switch to the W2K8R2CN3 computer (which just happens to be a Windows 2008 R2 Core installation) and ‘install’ the MSA on the computer. Now, if you are on Windows 2008 R2 Core installation, to use the ActiveDirectory cmdlets you must effectively install the RSAT-AD-PowerShell feature, but directly using DISM:

DISM /online /enable-feature /featurename=ActiveDirectory-PowerShell

Once that is done, you can easily associate the MSA on the target computer:

Install-ADServiceAccount TestSQLMSA

Important: To do this correctly, you must be a domain administrator. If you run the Install-ADServiceAccount cmdlet and you are not a domain admin, it silently exits, but later when you try to change the service account and start SQL, you will receive an error:

“Error 1069: The service did not start due to a logon failure” and HRESULT 0x8007042d.

If you get those errors, have a domain admin logon to the target server and re-run Install-ADServiceAccount for you.

Change the SQL Service account

Last but not the least: you use SQL Configuration Manager to execute the service account change. An important note is that when specifying the ‘user name’ for the MSA, you must fully qualify it with domain name, and have a trailing $ sign as well. If you do not specify the $ you will receive the error message:

“Invalid parameter [0x80041008]”

As an example, in my test setup, here is how Configuration Manager looks:

image

Supportability questions

MSA and older SQL versions: Though you may be able to configure and / or use MSA with SQL 2008 R2 or older versions, it is officially not tested. If you notice, the documentation for SQL 2008 R2 service accounts is noticeably silent on this topic.

Group Managed Service Accounts (GMSA): As of Feb 2014, Group Managed Service Accounts are NOT officially supported with SQL 2012 Failover Clustered Instances. This is under review by the Product Group for future releases.

References

Please leave your questions, comments below! I’d be very happy to hear from you. Till next time, bye!

What happens next? SQL with Windows login / multiple groups…

This is a very common case: take a Windows user who belongs to multiple groups. Next, grant more than one of those groups as logins to SQL Server. The question then is: which permissions will that session inherit?

The answer it turns out is that the permissions will be additive and most restrictive. When a Windows login user authenticates to SQL Server, the login token is populated with ALL the groups that this user is associated with. You can view this information through the sys.login_token view. The information in this can be joined with sys.server_principals to get more information. At the database level, you can view the user token using the sys.user_token view.

Additive: Example

Here is a simple example of what I am talking about. Windows user contosomultigroup belongs to two groups, as can be seen here in the output of whoami /groups:

CONTOSOgroup2                   Group            S-1-5-21-1304351064-1365540280-1243399584-1119 Mandatory group, Enabled by default, Enabled group
CONTOSOgroup1                   Group            S-1-5-21-1304351064-1365540280-1243399584-1118 Mandatory group, Enabled by default, Enabled group

We then grant these groups a login and map those logins to a test database:

USE [master]
GO

CREATE LOGIN [CONTOSOgroup1] FROM WINDOWS WITH DEFAULT_DATABASE=[master], DEFAULT_LANGUAGE=[us_english]
GO

CREATE LOGIN [CONTOSOgroup2] FROM WINDOWS WITH DEFAULT_DATABASE=[master], DEFAULT_LANGUAGE=[us_english]
GO

USE [testdb]
GO

CREATE USER [CONTOSOgroup1] FOR LOGIN [CONTOSOgroup1]
GO

CREATE USER [CONTOSOgroup2] FOR LOGIN [CONTOSOgroup2]
GO

Here is the information for the login token:

principal_id    sid    name    type    usage
2    0x02    public    SERVER ROLE    GRANT OR DENY
268    0x01050000000000051500000058D1BE4DB87D6451A0C51C4A5E040000    CONTOSOgroup1    WINDOWS GROUP    GRANT OR DENY
269    0x01050000000000051500000058D1BE4DB87D6451A0C51C4A5F040000    CONTOSOgroup2    WINDOWS GROUP    GRANT OR DENY

As you can see, the login token includes both groups. Next, let us view the user token within the TestDB database:

principal_id    sid    name    type    usage
0    0x01050000000000090400000083741B006749C04BA943C02702F2A762    public    ROLE    GRANT OR DENY
5    0x01050000000000051500000058D1BE4DB87D6451A0C51C4A5E040000    CONTOSOgroup1    WINDOWS GROUP    GRANT OR DENY
6    0x01050000000000051500000058D1BE4DB87D6451A0C51C4A5F040000    CONTOSOgroup2    WINDOWS GROUP    GRANT OR DENY

This should confirm what we talked about earlier – that the membership is additive.

Most Restrictive: Example

If we now proceed to deny login permissions to GROUP1:

DENY CONNECT SQL TO [CONTOSOgroup1]
GO

Then our login for CONTOSOmultigroup fails:

Login failed for user ‘CONTOSOmultigroup’. (Microsoft SQL Server, Error: 18456)

Conclusion

The SQL Server engine security model is very powerful and flexible. At the same time it can be confusing in some scenarios. Hopefully, in this post I have cleared up an FAQ item around multiple group memberships for a Windows user; and how the effective permissions are both additive and also most restrictive.

That’s it for now! See you soon!

Indexed views with remote tables–possible?

Once upon a time, I had this question come up during an onsite visit: someone wanted to create a ‘materialized’ copy of the remote data locally as an indexed view. I had never heard that requirement before, but since the motivation was genuine, I decided to validate it.

There is a long list of what is permitted (and what is not) in indexed views. One fundamental requirement is that the view on which the index is created, must be schema-bound. And this fundamental requirement, actually nails the answer straight away: any attempt to create a schema-bound view which references a remote table ends up with the SQL error 2014:

Remote access is not allowed from within a schema-bound object

Now that it is clear that such ‘projection’ is not possible, we have to step back and understand the real problems why this requirement was posed. In most cases, such ideas come up for improving performance of distributed queries (DQ). For all such cases, I suggest you first start by reviewing the excellent talk from Conor on DQ.

But in many cases, especially where multiple remote data sources are involved, probably the best way to optimize the queries is to consolidate the data inside a data mart or warehouse, and then possibly consider the columnstore index (or the new Clustered Columnstore in SQL 2014) feature to optimize the aggregation-type queries which are very typical in those scenarios.

“Could not add cluster access…” during SQL 2012 Failover Cluster Instance setup

Today a lab setup of SQL 2012 SP1 + CU7 (I was using the /UpdateSource flag to have a patched setup + roll in the updated bits in one go) failed sometime before creating the SQL resource in cluster. The error message from the detail.txt file showed:

Could not add cluster access for SID ‘S-1-5-80-2358576813-3362504073-1364548504-537691503-200193738’. Error: There was a failure to call cluster code from a provider. Exception message: The specified path is invalid.

Preliminary Analysis

The GUID (S-1-5-80-2358576813-3362504073-1364548504-537691503-200193738) in this case happened to be the SQL Service SID. How did I know that? You can also do it, if you use the SC SHOWSID command:

C:Windowssystem32>sc showsid mssql$inst1

NAME: mssql$inst1
SERVICE SID: S-1-5-80-2358576813-3362504073-1364548504-537691503-200193738

This issue occurs when the setup routine fails to add the service account’s SID to the list of accounts allowed to access the Windows cluster itself. For example, in my test cluster (in a healthy state, of course) you can see the Service SID is added to the list of authorized accounts in the security descriptor of the Windows cluster itself:

image

But of course in my failed setup attempt, the SID was not added correctly. The big question is why?

(Another side question some of you may have is why should the service account have this permission on the cluster itself. Well, that’s another topic and another blog post – stay tuned for that!)

Digging deep: Cluster Log

Carrying on from the previous section, one helpful hint is to know that when such changes are made to the cluster’s ACL configuration, the changes are persisted to the cluster database, a copy of which is also persisted to the quorum disk resource (if one was configured.)

Now in my case, deeper troubleshooting was required to find out why the security descriptor could not be written. To find out, I dumped the cluster log (using the PowerShell cmdlet Get-ClusterLog) and found the following messages at the same time that the SQL setup failed:

00001004.00000fe0::2014/01/23-07:20:05.486 INFO  [RES] Physical Disk <RealWitness>: Path W:Cluster is not on the disk
00001004.00000fe0::2014/01/23-07:20:05.486 ERR   [RHS] Error 161 from ResourceControl for resource RealWitness.
00000fb8.000011fc::2014/01/23-07:20:05.486 WARN  [RCM] ResourceControl(STORAGE_IS_PATH_VALID) to RealWitness returned 161.
00000fb8.000011fc::2014/01/23-07:20:05.585 ERR   [RCM] rcm::RcmApi::SetQuorumResource: ERROR_BAD_PATHNAME(161)’ because of ‘ValidateQuorumPath( pRes, quorumPath )’

W:Cluster did not make sense initially, because this drive was supposed to be unused by this instance of SQL. I had to think of sequence of operations I had done during this setup…

Root cause found

Looking back, I remembered that I had swapped the disk quorum resource somewhere halfway in between the setup. It was earlier on the W: drive, but now on the Q: drive, which has the resource name RealWitness. So in short, the resource name for the quorum disk was correct, but the drive letter and path on that disk was incorrect. Stale information was used when the setup program tried to update the cluster configuration (which in turn would write to the quorum disk).

Important: Please note that this is the specific root cause for my specific situation. There may be other types of problems when the setup program tries to set the cluster access for the SID. One has to look at the inner exception message (which in my case was ‘specified path is invalid’) to be sure. The true root cause for other cases can typically be found by correlating to the cluster log file.

Solution

The solution, in this case was to remove the SQL bits (from the Control Panel) and nothing else but to re-run the setup program without any changes. It was a cleaner way, and the ‘moral of the story’ is never tweak cluster quorum in the middle of a SQL Setup procedure!

Exotic spinlocks: XTS_MGR

My friend and colleague – Fabricio Catae – had blogged about spinlocks previously. In one of the comments, a visitor has asked him about the XTS_MGR spinlock. There is no public documentation about this spinlock, because it is largely implementation specific detail.

However, a general explanation is that this spinlock is taken out in many functions within SQL Server, most of which are responsible for version store operations (when you enable Read Committed Snapshot for example). So if you ever face contention for this spinlock, one thing to consider is the effect of highly concurrent and extensive DML workloads, which generally work the version store much harder. Other than that, it is best to contact Microsoft CSS if you suspect that this contention is causing problems on your instance of SQL.

More information on the version store can be found here:

If you have more questions or have seen other spinlocks which you are curious about, do leave a comment. I’ll see what we can explain regarding those!