Posts by Arvind Shyamsundar

I work to enable customers, partners and colleagues to excel at what they do by harnessing the power of data, advanced analytics and cloud services. I work as a Principal Program Manager within the Azure Data division at Microsoft Redmond, focussing on Azure SQL. I work with customers who are adopting our services and products in the most demanding workloads, and translate learnings from those engagements into product feedback and planning input so that we can make Azure SQL even better. I am proud to be one of the first Microsoft Certified Master (MCM) for SQL Server 2008.

Query of the day: finding SQL Server queries with large memory grants

Quick tip from me today: I recently had to check on which T-SQL query / queries in the system were using up some monster 30GB+ query grants. Luckily the sys.dm_exec_query_memory_grants DMV facilitates this. Here is the query I finally used to figure out what was happening:

SELECT r.session_id
    ,mg.granted_memory_kb
    ,mg.requested_memory_kb
    ,mg.ideal_memory_kb
    ,mg.request_time
    ,mg.grant_time
    ,mg.query_cost
    ,mg.dop
    ,(
        SELECT SUBSTRING(TEXT, statement_start_offset / 2 + 1, (
                    CASE
                        WHEN statement_end_offset = – 1
                            THEN LEN(CONVERT(NVARCHAR(MAX), TEXT)) * 2
                        ELSE statement_end_offset
                        END – statement_start_offset
                    ) / 2)
        FROM sys.dm_exec_sql_text(r.sql_handle)
        ) AS query_text
    ,qp.query_plan
FROM sys.dm_exec_query_memory_grants AS mg
INNER JOIN sys.dm_exec_requests r ON mg.session_id = r.session_id
CROSS APPLY sys.dm_exec_query_plan(r.plan_handle) AS qp
ORDER BY mg.required_memory_kb DESC;

In case you are wondering what memory grants are all about, you should start from this great blog post from Jay Choe, who is part of the SQL Server development team. Jay also posts some useful variants of queries to find out (for example) cached query plans with memory grants. It is a great read.

Happy query tuning!

SQL Server and ‘Instant File Initialization’ Under the Hood – Part 1

Recently a colleague of mine popped up a very interesting question around whether the SQL Server ‘Buffer Pool Extension’ feature in SQL 2014 uses the ‘instant file initialization’ optimization (or not). While answering that question I found some useful information which I believe will help many of us. So here we go… firstly, we need to understand what ‘instant file initialization’ is really all about, from the Windows perspective.

Background

At the OS level every file has three important attributes which are recorded in the metadata of the NTFS file system:

  • Physical file size
  • Allocation file size
  • Valid data size

In this post, we are mostly concerned with Physical and Valid Data sizes. More details are available at the this MSDN page but for simplicity, let me put it this way:

  • When you create a file with the CreateFile API, it starts with a 0 byte length
  • One way to ‘grow’ the file is of course to sequentially write bytes to it.
  • But if you want to ‘pre-size’ the file to a specific size, then you may not want to explicitly write data upfront.
  • In those cases the OS provides a SetEndOfFile() API to ‘resize’ the file, but as you will see below, there are still some things which will hold up the thread when the first write operation is done to the pre-sized file

Let’s work through this step-by-step. A bit of programming knowledge will help, though it should be fairly easy to figure out what’s going on by reading the comments inline in the code! Smile 

Growing a file: C++ example

Here is a simple program which will demonstrate how you can grow a file to 3GB without having to write individual bytes till the 3GB mark:

#include <Windows.h>

int _tmain(int argc, _TCHAR* argv[])
{
    // create a file first. it will start as an empty file of course
    HANDLE myFile = ::CreateFile(L"l:\temp\ifi.dat",
        GENERIC_WRITE,
        0,
        NULL,
        CREATE_ALWAYS,
        FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (INVALID_HANDLE_VALUE == myFile)
    {
        return -1;
    }

    // let’s now make the file 3GB in size
    LARGE_INTEGER newpos;
    newpos.QuadPart = (LONGLONG) 3 * 1024 * 1024 * 1024;

    LARGE_INTEGER newfp;

    // navigate to the new ‘end of the file’
    ::SetFilePointerEx(myFile,
        newpos,
        &newfp,
        FILE_BEGIN);

    // ‘seal’ the new EOF location
    ::SetEndOfFile(myFile);

    // now navigate to the EOF – 1024 bytes.
    newpos.QuadPart = (LONGLONG)3 * 1024 * 1024 * 1024 – 1024;
    ::SetFilePointerEx(myFile, newpos, &newfp, FILE_BEGIN);

    DWORD dwwritten = 0;

    // try to write 5 bytes to the 3GB-1024th location
    ::WriteFile(myFile,
        "hello",
        5,
        &dwwritten,
        NULL);

    return 0;
}

When we execute the above code, you will see that though we used the SetEndOfFile() API to locate the EOF marker without us explicitly writing anything, there is some work being done by the OS underneath our code to ‘zero’ out the contents of the clusters allocated to us. This is done for data privacy reasons and since it is physical I/O, it does take a while. You may want to refer the documentation for the SetFilePointerEx function:

Note that it is not an error to set the file pointer to a position beyond the end of the file. The size of the file does not increase until you call the SetEndOfFile, WriteFile, or WriteFileEx function. A write operation increases the size of the file to the file pointer position plus the size of the buffer written, leaving the intervening bytes uninitialized.

Snooping in with Process Monitor

You can actually look at the proof of what is happening underneath the hood by using Process Monitor from the Sysinternals suite. Here is a complete call stack of the application. Notice the call in the kernel to zero out data (CcZeroDataOnDisk). Notice that these are not our API calls. We simply called WriteFile() and that triggered off these underlying ‘zeroing’ writes.

image

In the same ProcMon trace you will also notice a bunch of I/O operations (corresponding to the above stack) just after I triggered my 5 bytes I/O:

image

The key takeaway from this walkthrough is that when we called SetEndOfFile(), we do not affect the ‘valid data length’ of that file stream. In that case, the OS will play it safe by zeroing out the allocations from the previous valid file length (which in our case above was actually 0) leading up to the location of the write (which in our case is 1024 bytes prior to the physical end of the file.) This operation is what causes the thread to block.

Growing a file – the ‘fast’ way

Instant File Initialization as we know it in SQL Server really reduces to an API call in Windows. To see that, we tweak the above sample and add in the ‘secret sauce’ which is the call to SetFileValidData() API:

// ‘seal’ the new EOF location
::SetEndOfFile(myFile);

// now ‘cleverly’ set the valid data length to 3GB
if (0 == ::SetFileValidData(myFile, newpos.QuadPart))
{
    printf("Unable to use IFI, error %d", GetLastError());
}
else
{
    printf("IFI was used!!!");
}

// now navigate to the EOF – 1024 bytes.
newpos.QuadPart = (LONGLONG)3 * 1024 * 1024 * 1024 – 1024;

You will then see that the same code executes almost instantly. The reason for this is because the OS will no longer need to zero any bytes underneath the hood, because the valid data length (as set by the above API call) == file size. This can be seen in Process Monitor as well:

image

Dangers of SetFileValidData()

The important thing to note is that SetFileValidData() is a dangerous API in a way, because it can potentially expose underlying fragments of data. Much has been said about this, and you can check out Raymond’s blog post on this topic. The MSDN page for this API is also very clear on the caveats:

You can use the SetFileValidData function to create large files in very specific circumstances so that the performance of subsequent file I/O can be better than other methods. Specifically, if the extended portion of the file is large and will be written to randomly, such as in a database type of application, the time it takes to extend and write to the file will be faster than using SetEndOfFile and writing randomly. In most other situations, there is usually no performance gain to using SetFileValidData, and sometimes there can be a performance penalty.

What next?

Of course, if you are like me, you are probably wondering what this all equates to. Remember, we are trying to explore some of the basis and background on the ‘instant file initialization’ optimization that SQL Server can leverage to quickly size new and grown chunks for data files. As the documentation and our team’s blog post explain in detail, this setting can be very useful in certain cases and is in fact recommended for deployments on Microsoft Azure IaaS VMs.

Next time, I will correlate this information we learnt above to how SQL Server leverages it in the process of creating new data files or growing existing ones. Till then, goodbye!

The strange case of the bloated C:WindowsCSC folder

A few days ago, when I was working at the office on the corporate network, I had to access a network share containing some installation files. At the time of starting this process, I remember distinctly that my disk free space was around 25GB or thereabouts. A few hours later, I found that my disk free space was down to almost 1GB! This, despite the fact that I never installed anything actually – I had just visited the network share to inspect file details.

A scan of the system with TreeSize Free (there are many others like SequoiaView which I have used as well) revealed around 25GB in the C:WindowsCSC folder itself. On first thoughts I was wondering if this folder was some kind of C# compiler related folder but that was quickly ruled out.

The answer is that this folder is used by the Offline Files feature in Windows. The way to check on the status and configuration for Offline Files in Windows 8.1 is quite simple. Open the Control Panel, and do a search for the keyword ‘Offline’. You will see the entry under ‘Sync Center’:

image

Click on ‘Manage offline files’ and therein you can check if Offline Files is enabled:

image

Then you can click on ‘Disk Usage’ to actually check how much space is used by Offline files. There are some files which are temporary in nature, and as you can see below, that was the case in my scenario. My colleague, Sujay Desai gave me this link to help understand what the Temporary Files really meant.

image

A good way to mitigate and control the disk space for Offline files is to adjust the limit of disk space for this feature. But I first chose to delete the ‘Temporary files’. That takes a few minutes typically and it will report that it has deleted the same:

image

Now, if you are a travelling person, you must also be aware of a proactive feature wherein on a slow connection Windows will automatically start caching files using Offline files. This is not configurable from the UI and can only be controlled by using a Group Policy.

image

There are many valid and useful cases where Offline Files can be helpful, but in my case I have no use for it. So, I decided to explicitly turn this Offline files feature off. You will be prompted for a reboot of the system after this is done:

image

Disclaimer: please only disable Offline files if you know what you are doing and if it is causing space issues on your boot volume. If you have made offline changes to your files, make sure you sync them back before doing any changes, because otherwise you may lose your changes. So please be VERY CAREFUL when you do this.

The end state in my system is as follows; Offline Files is disabled and the Cache has been emptied:

image

And of course, my free disk space is back to normal!

Hope you liked this tip, and if you liked it, please leave a comment and / or rate the blog post. I would appreciate that very much!

The mysterious ‘MD’ lock type, and why you should stop using sp_lock

Today during some discussions with customers, there was a question about some locks being held by a session. Here is an example reproduced below:

begin tran
select * from Person.Person
where LastName = ‘Singh’

exec sp_lock @@spid

Here is the output:

spid    dbid    ObjId    IndId    Type    Resource    Mode    Status
52    11    0    0    DB                                        S    GRANT
52    11    0    0    MD    14(10000:0:0)                       Sch-S    GRANT
52    11    0    0    MD    14(10001:0:0)                       Sch-S    GRANT

52    1    1467152272    0    TAB                                        IS    GRANT
52    32767    -571204656    0    TAB                                        Sch-S    GRANT

The two rows highlighted in bold in the output were the point of discussion. It was not very apparent as to what those locks were attributed to. So, here is where the power of the newer DMV: sys.dm_os_tran_locks becomes apparent:

select resource_type, resource_subtype, resource_description, request_mode from sys.dm_tran_locks
where request_session_id = @@spid

Here is the output:

resource_type resource_subtype resource_description request_mode
METADATA XML_COLLECTION xml_collection_id = 65536    Sch-S
METADATA XML_COLLECTION xml_collection_id = 65537    Sch-S

Aha! So this made much more sense. So these are metadata locks on XML schema collections. When you look at the Person.Person table, indeed there are two XML columns to which XML schema collections are bound to:

[AdditionalContactInfo] [xml](CONTENT [Person].[AdditionalContactInfoSchemaCollection]) NULL,
[Demographics] [xml](CONTENT [Person].[IndividualSurveySchemaCollection]) NULL,

When you further reconcile the xml_collection_id from the tran_locks DMV, this is sealed:

select xml_collection_id, name from  sys.xml_schema_collections
where xml_collection_id in (65536, 65537)

Here is the output:

xml_collection_id    name
65536    AdditionalContactInfoSchemaCollection
65537    IndividualSurveySchemaCollection

So, what other types of resources can we expect in the sys.dm_os_tran_locks DMV? If you do some poking around in my favorite catalog view sys.dm_xe_map_values, you will find the answer:

select map_value from sys.dm_xe_map_values
where name = ‘lock_resource_type’
order by map_key

Here is the output:

UNKNOWN_LOCK_RESOURCE
NULL_RESOURCE
DATABASE
FILE
UNUSED1
OBJECT
PAGE
KEY
EXTENT
RID
APPLICATION
METADATA
HOBT
ALLOCATION_UNIT
OIB
ROWGROUP
LAST_RESOURCE

Note: the above output was produced from a SQL 2014 instance, so you may not find all the values in older versions of SQL. Most of the above are easy to understand (for example, Object, Page or Key.)

NOW – there are some others in the above list which are not that easily understood. If you want to hear more, please leave a comment and indicate what exactly you want to know more about! I’ll do my best to explain within the boundaries of what we can share publicly Smile

Azure Vidyapeeth webinar: Introduction to Azure Machine Learning

Today I had the privilege to speak at the Microsoft India Azure Vidyapeeth webinar series – it was an introductory session on Azure Machine Learning – which is Microsoft’s cloud-first offering in the Machine Learning space.

Those who attended live, thank you for your interest; and those who wish to view the recording can do so at their own leisure by visiting the above link and registering for the ‘Introduction to Azure Machine Learning’ session. You will then receive and email with directions for downloading the slides and for viewing the webinar.

At this moment I also wanted to share some key links for those interested to get started with Azure ML:

  • Azure ML Studio – here on the right side you can actually avail of the ‘free’ tier for Azure ML where you can explore all the features without having to purchase a full Azure subscription and neither do you need a credit card.
  • The Azure ML team blog is an invaluable resource to stay abreast of what’s happening on the service front. They are constantly adding new features and fixes, and the blog is where key announcements are made.
  • Once you get started and play around with Azure ML, you can visit the MSDN forum to get community support. Paid support is available as well.
  • The Azure ML documentation is actually very useful and rich. I strongly recommend taking a look!

That’s it for now; in case you have any follow up questions to my presentation feel free to leave comments – I will take a look and get back!

Beware: Compatibility Level 90 is no longer available in SQL 2014

This is a quick, perhaps Level 200 post – a rarity on my blog Smile but I am seeing this so often that I think it important to get this out there.

Those who are running databases with compatibility level 90 in SQL Server 2008 / 2012 need to be aware that there is no more support for level 90 in SQL Server 2014. The SSMS 2014 UI is a bit confusing in this respect, because it does show you the option to set level as 90:

image

But when you try to actually change this to 90, it fails:

Msg 15048, Level 16, State 3, Line 1
Valid values of the database compatibility level are 100, 110, or 120.
Msg 5069, Level 16, State 1, Line 1
ALTER DATABASE statement failed.

This information is indeed documented in Books Online: “When a database is upgraded to SQL Server 2014 from any earlier version of SQL Server, the database retains its existing compatibility level if it is at least 100. Upgrading a database with a compatibility level of 90 sets the database to compatibility level 100. “

Call to action to everyone who plans to upgrade to SQL Server 2014 and is running a database on compatibility level 90 is to thoroughly test with compatibility level as 120 (or perhaps 110 if you feel the changes in the cardinality estimator will warrant more testing in your workload case) before you upgrade. Both functional and stress testing must be done. Once the testing is signed off, you can confidently do the upgrade in production with the higher compatibility levels – 120 (recommended; subject to testing outcome) or 110.

My team of Premier Field Engineers within Microsoft Services can also be of help to your upgrade efforts and help detect and mitigate risks related to upgrading to the current version. Do engage us if you need help!

Microsoft Mumbai Tech Community meeting on 31 Jan 2015

A community is a social unit that shares common values and what better than technology as a binding force. We are very excited to announce the launch of the Microsoft Mumbai Tech Community. Our goal as part of this community is to:

 

  1. Knowledge sharing by our experts here at Microsoft
  2. Create awareness about new product and services offerings and features
  3. Build a cohesive community

In line with this aspiration, We are kicking off the Microsoft Mumbai Tech Community with 2 expert sessions. If you are in Mumbai, India, please let my colleague Mahendraprasad know (details below) that you will be attending!

 

Date 31 Jan 2015
Timing 10AM – 1PM
Address Microsoft Corporation India Pvt. Ltd.
Training rooms Oak / Juniper
4th Floor, ‘Windsor’,
off CST Road, Kalina,
Santacruz (East)
Mumbai, 400098
Contact MADUBEY at microsoft dot com

Here is what our team is planning to talk about:

 

Presenter Narendra Angane
Schedule 10:00 till 11:15 AM
Topic Introduction to Power BI
Abstract

Power BI for Office 365 is a self-service business intelligence (BI) solution delivered through Excel and Office 365 that provides information workers with data analysis and visualization capabilities to identify deeper business insights about their data. Will see how we can use the below components for data analysis and visualization:

  • Power Query
  • Power Map
  • Power View
  • Q&A

Presenter Mahendraprasad Dubey
Schedule 11:30 AM till 12:45 PM
Topic Evangelizing Data Collection Strategy
Abstract

Data collection has been nightmare. Several time either we can’t collect data as it has performance impact on server or we miss data that needs to be analyzed. Data collection has been made simpler. Let’s discover different way of collecting data.

  • XEvent Metadata
  • Extended Events Tutorial
  • Finding Your Events
  • Actions and Targets
  • Demo
  • Q&A

We are looking forward to hosting you at this event. Do RSVP your participation to MADUBEY at microsoft dot com.

Exotic wait types: PREEMPTIVE_OS_ENCRYPTMESSAGE

Some time ago, we were discussing wait types with a group of customers. To demonstrate the waits-and-queues methodology we use, I showed the group the sys.dm_os_wait_stats DMV, and the results were sorted by wait_time_ms in descending order. On the top of that list was the PREEMPTIVE_OS_ENCRYPTMESSAGE wait type. At that moment I was not exactly sure what that wait type was and I promised to look into it later. This post is a quick explanation of the same.

It turns out that if the SQL client and / or server is configured to use SSL for secure communications, the implementation within SQL Server will use an Win32 API hosted within SspiCli.dll. Such wait states – where the SQLOS calls into an external DLL which SQL has no direct control over – are identified with the PREEMPTIVE_* label.

So in short, this wait type is representative of the time it takes to encrypt TDS traffic back to the client. Does it represent a problem if it is on top of the list? Not necessarily, and as with all wait types, it needs to be carefully examined within the context of the specific issue being investigated and co-related with other observations such as performance counters etc. In certain cases, if you strongly suspect that this is representative of a problem, you can always check if you have configured SSL for SQL Server or for the client, and test with that turned off (subject to the boundary security requirements of course!)

Till next time, have a good one!

Error message “Copying cluster-resource …fssres.dll to C:Windowssystem32fssres.dll failed…” during SQL Server 2014 setup

First and foremost, let me reproduce the complete error message I refer to above: “Copying cluster-resource C:Program FilesMicrosoft SQL ServerMSSQL12.SQL14MSSQLBinnfssres.dll to C:Windowssystem32fssres.dll failed because the destination file exists and is read-only”

Investigation

I received the above error message while setting up a SQL Server 2014 instance side by side with an existing copy of SQL Server 2012. The setup caused the “Database Engine” component to fail and later it had to be removed before re-attempting setup.

The root cause of such setup failures is often quite easy to locate when you look at the setup logs, which are conveniently placed under the “C:Program FilesMicrosoft SQL Server120Setup BootstrapLog” folder on your machine. When I looked at the Detail.txt file from such the problematic server, I could sport the following section:

(01) 2015-01-04 08:09:34 Slp: UpdateClusterResourceAction: Resource type = ‘SQL Server FILESTREAM Share’ ResourceDllName = fssres.dll Source Location = ‘C:Program FilesMicrosoft SQL ServerMSSQL12.SQL14MSSQLBinn’ Target Location = ‘C:Windowssystem32’.
(01) 2015-01-04 08:09:40 Slp: Type ‘SQL Server FILESTREAM Share’ not found. Performing copy directly …
.…
(01) 2015-01-04 08:09:40 Slp: Failed to version-copy file ‘C:Program FilesMicrosoft SQL ServerMSSQL12.SQL14MSSQLBinnfssres.dll’ to ‘C:Windowssystem32fssres.dll’. Exception data is: System.IO.IOException: The process cannot access the file ‘C:Windowssystem32fssres.dll’ because it is being used by another process.
   at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   at System.IO.File.InternalCopy(String sourceFileName, String destFileName, Boolean overwrite)
   at Microsoft.SqlServer.Configuration.Cluster.UpdateClusterResourceAction.VersionCopy(String source, String target).
….
Microsoft.SqlServer.Configuration.Cluster.UpdateClusterResourceException:  Copying cluster-resource C:Program FilesMicrosoft SQL ServerMSSQL12.SQL14MSSQLBinnfssres.dll to C:Windowssystem32fssres.dll failed because the destination file exists and is read-only. —> System.IO.IOException: The process cannot access the file ‘C:Windowssystem32fssres.dll’ because it is being used by another process.
….
Microsoft.SqlServer.Configuration.Cluster.UpdateClusterResourceAction.VersionCopy(String source, String target)

From the above, it is quite clear that the FSSRES.DLL already exists previously. Now, when I checked the file version it was clear that the existing version is of version SQL Server 2012:

image

On executing a TASKLIST /M fssres.dll command it quickly became clear that the process which has this DLL already loaded is the RHS.exe for the existing instance of SQL 2012. That instance had an Availability Group already configured hence the RHS.exe was loading the DLL.

Given that the DLL was already loaded by the RHS.exe, there was no way for SQL setup to update it. That is why the SQL 2014 instance setup failed.

Workaround

The workaround I had to use in the above case was to firstly remove the SQL 2014 Database Engine instance; shutdown the cluster service on the failed node and then re-attempt the SQL 2014 setup. Less than ideal, but till the above issue is handled in the setup, we have very little options. I have logged also a Connect item requesting the Product team to investigate this issue. If you think this issue should be fixed, please vote it up on Connect!

Using PowerShell to attach PST files to Outlook

This one is a really quick note. Yet another example to show off the power of the PowerShell pipeline and its interoperability with the COM world.

Recently I had to add 30+ existing PST files to an instance of Outlook. I hate using the mouse and keyboard for repetitive tasks (File –> Open Outlook Data File –> browse to the path –> click OK), so I set out to do the same through some form of scripting.

Luckily, PowerShell can create instances of COM objects, and Outlook provides a COM library to automate Outlook. It was a matter of a few minutes then to look around on Bing and compose the script:

Add-type -assembly "Microsoft.Office.Interop.Outlook" | out-null
$outlook = new-object -comobject outlook.application
$namespace = $outlook.GetNameSpace("MAPI")
dir “c:mypath*.pst” | % { $namespace.AddStore($_.FullName) }

One interesting thing with this is that you may need to run this with the correct version of PowerShell (32-bit is at c:windowssyswow64windowspowershellv1.0powershell.exe) to match the bitness of Outlook installed on your machine.

Also, if you do plan to use the above script, make sure you replace the path with the correct ones. Also do note that if the PST file path is incorrectly specified, Outlook will create a PST at that location, so be careful!

The AddStore method is documented here. Do note though this COM interface does not permit the addition of password protected PST files. In that case you may want to explore the third-party Redemption library.

Disclaimer

Note: Please note third-party links are provided as-is and Microsoft does not offer any guarantees or warranties regarding the content on the third party site.

This Sample Code is provided for the purpose of illustration only and is not intended to be used in a production environment.  THIS SAMPLE CODE AND ANY RELATED INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE.  We grant You a nonexclusive, royalty-free right to use and modify the Sample Code and to reproduce and distribute the object code form of the Sample Code, provided that You agree: (i) to not use Our name, logo, or trademarks to market Your software product in which the Sample Code is embedded; (ii) to include a valid copyright notice on Your software product in which the Sample Code is embedded; and (iii) to indemnify, hold harmless, and defend Us and Our suppliers from and against any claims or lawsuits, including attorneys’ fees, that arise or result from the use or distribution of the Sample Code.

This posting is provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.