Error 0xC1900101 – 0x20017 upgrading to Windows 10 build 10162

We are all excited about the upcoming release of Windows 10 on the 29th of July. In the meantime, you can install a preview version from the Windows Insider web page, or if you’d like to wait, you can reserve your free upgrade by following the steps here.

At the beginning…

At Microsoft, we are at the forefront of the ‘self-hosting’ ballgame. Some time back, I had installed Windows 10 preview build 10074. This week, with the newer builds being released in rapid succession, I decided to switch the Preview Build frequency to ‘Windows Insider Fast’ so that I could get hold of the latest (at the time) 10162 build:

image

Everything seemed to work alright with the update to 10162…

image

… till the point where it rebooted the computer:

image

 

Ahem…

Instead of seeing the usual ‘Upgrading Windows, sit tight..’ screen, this is what I saw:

clip_image002

Well. the keyboard layout screen was certainly not expected! But when I selected US keyboard, it asked me whether I wanted to Troubleshoot or shutdown; which is again quite strange:

clip_image004

So I played along and tried to troubleshoot, at which stage I was basically left with options such as Reset the PC:

clip_image006

And even Reset PC failed with a helpful error message that the ‘drive where Windows is installed is locked’:

clip_image010

Left with no option – post reboot, I was greeted with the infamous error 0xC1900101 – 0x20017, which has plagued some users in the Windows 8.1 upgrade as well:

image

 

Some clues

At this time, I also noticed a very strange issue: any USB mass storage device I would plug in to the laptop was not assigned a drive letter. It would show up in Disk Management, but when I would try to give it a drive letter, Disk Management would popup a cryptic message: ‘The operation failed to complete because the Disk Management console view is not up-to-date. Refresh the view by using the refresh task. If the problem persists close the Disk Management console, then restart Disk Management or restart the computer’.

Well, this was getting stranger and stranger! But somehow, it appeared that the two issues were inter-related. Internally, I was told that the error code 0xC1900101 – 0x20017 was also related to USB drivers. And the above disk management error was in the past attributed to ‘ghost’ drives.

Hide and seek

With that clue in mind, I took at look at my Device Manager screen and enabled the ‘Show Hidden Devices’ under the View menu. In my case, I found some devices under Storage Volumes which only showed up when Hidden Devices were enabled. And those were clearly not attributed to any devices I had installed on my laptop (i.e. I had nothing related to Roxio Saib in reality on my laptop):

image

So I decided to delete these hidden Roxio devices (in another case where I had seen an identical problem the devices showed up as Unknown Device under Storage Volumes, in which case I proceeded to delete those Unknown Devices under Storage Volumes).

WARNING: Using Device Manager incorrectly can cause serious problems that may require you to reinstall Windows. Microsoft cannot guarantee that problems resulting from deleting incorrect devices can be solved. Use these steps at your own risk. Please note again that in this walkthrough, we have only deleted Hidden, Unknown devices which were listed under the Storage Volumes category.

clip_image002[5]

There were two of these Roxio Saib devices (which clearly did not apply to my laptop) and I proceeded to clean both instances up, followed by a reboot:

image

 

We’re there!

After the reboot, I re-tried the Update:

image

At the end of this, and post restart, lo and behold! The expected ‘Upgrading Windows’ screen appeared:

clip_image002[7]

And after around 20 minutes, the upgrade to 10162 was done successfully!

clip_image004[5]

For those of you who had the misfortune to run into the error code 0xc1900101 – 0x20017 when upgrading to Windows 10, I hope this walkthrough helps you get unblocked! And in case you are still unsure or not confident of what to do, please contact Microsoft Customer Services and Support to get more direct help on your issue.

Advertisements

The strange case of the large LiveKernelReports folder

Some time back, I ran into a bit of a space crunch on the C: drive of my laptop which runs Windows 8.1. On digging a bit, I found a 2GB+ file at C:\Windows\LiveKernelReports\WinsockAFD-20150114-1722.dmp. Now, this was the first time I had seen a folder called LiveKernelReports and definitely the first time that I had seen a dump file with the name WinsockAFD*.dmp.

Important note: if you are not a developer and came here trying to figure out what to do with the files in this folder, please proceed directly to the ‘So What?’ section below.

Inside the Dump File

The first thing I did (of course Smile) was to open up the dump file under the WinDbg debugger. In kernel mode dumps, the !analyze –v command generally gives good analysis results, so I decided to start from there (full output of !analyze is at the end of this post as an Appendix).

Firstly, the bugcheck code was 0x156. If you are a developer, and you have the Windows 8.1 SDK, you will see this file C:\Program Files (x86)\Windows Kits\8.1\Include\sharedbugcodes.h which has the bugcheck names. 0x156 is WINSOCK_DETECTED_HUNG_CLOSESOCKET_LIVEDUMP.

Second, this bugcheck, unlike most of the ones we know, did not ‘crash’ or ‘blue screen’ the system.

Live Kernel Dumps

All of this is great, but what’s really happening here? How come I got kernel dumps without the system ‘crashing’? Well, the answer is that in Windows 8.1 the Windows development team added some great reliability diagnostics in the form of ‘Live Kernel Dump Reporting’. With this feature, certain Windows components can request a ‘live dump’ to be gathered. In my above case, both a minidump (~ 278KB) and a ‘larger’ dump (~ 2GB) were gathered when the AFD (Ancillary Function Driver for WinSock) runtime detected that a socket did not close ‘in time’ (see bold sections in the Appendix for more information.)

The Windows Error Reporting feature will then use the minidump to help the Windows development team figure out if this is a ‘trending’ issue, prioritize it and then hopefully fix it if it is due to an issue with Windows. The ‘larger’ dump which I mentioned above is not normally uploaded unless the development team ‘asks’ for it again via the Windows Error Reporting and Action Center mechanisms (to ultimately give the end user control on what gets submitted.)

So What?

That is the million dollar question Smile As an end user, you may be wondering what to do with these types of dump files. The advice I can give you is: if the dump files are causing you to go very low on disk space, you can probably move the dump file off to cheaper storage, like an external HDD. BUT if you are repeatedly getting these dump files, it may be advisable to check for any third party drivers, especially anti-virus products or any other network related software. Sometimes older versions of such software may not ‘play well’ with Windows 8.1 and may be causing a stalled network operation, in turn leading to these dump files.

If you are an IT Pro and seeing these dump files on server class machines and / or on multiple PCs, you would do well to contact our CSS (Customer Service and Support) staff who can guide you further on why these dump files are occurring and what should be the course of action.

In Closing

I hope this helps understand this system folder and why it plays an important role in improving the reliability of Windows. If you are interested in this topic, I highly recommend this talk from Andrew Richards and Graham McIntyre, who are both on the Windows Reliability team. They explain how the OCA / WER mechanism works. Amazing stuff, check it out!

Appendix: !analyze –v output

0: kd> !analyze -v
…

WINSOCK_DETECTED_HUNG_CLOSESOCKET_LIVEDUMP (156)
Winsock detected a hung transport endpoint close request.
Arguments:
…

DEFAULT_BUCKET_ID:  WINBLUE_LIVE_KERNEL_DUMP

BUGCHECK_STR:  0x156

…

STACK_TEXT:
…

ffffd001`28e46660 fffff803`bdddd64d : ffffffff`800026bc 00000000`00000000 ffffc001`1f52ec00 00000000`00000000 : nt!DbgkpWerCaptureLiveFullDump+0x11f
ffffd001`28e466c0 fffff801`21b7e3b4 : 00000000`00000001 ffffd001`28e46889 00000000`00000048 ffffe000`3e9afda0 : nt!DbgkWerCaptureLiveKernelDump+0x1cd
ffffd001`28e46710 fffff801`21b7b4ff : ffffe000`3e9afda0 00000000`0000afd2 ffffe000`3e9afd00 00000000`00000002 : afd!AfdCaptureLiveKernelDumpForHungCloseRequest+0xa8
ffffd001`28e46770 fffff801`21b89cad : ffffe000`3e9afda0 ffffd001`28e46889 00000000`0000afd2 ffffd001`28e46808 : afd!AfdCloseTransportEndpoint+0x64ef
ffffd001`28e467d0 fffff801`21b89674 : 00000000`00000001 ffffe000`42d71010 00000000`00000000 ffffe000`3e9afda0 : afd!AfdCleanupCore+0x14d
ffffd001`28e468f0 fffff803`bdc47349 : ffffe000`42d71010 ffffe000`3d3fd080 00000000`00000000

…

Inline Index Definition in SQL Server 2016 CTP2

Firstly, I am sure you are as excited as I am about the next version of SQL Server, and you can try the public preview (a.k.a. Community Technology Preview 2 – CTP2) by visiting the evaluation page and install it NOW! Today morning, at the start of a  new week and I found something new in SQL 2016 CTP2 which has not been widely discussed as yet: inline index definition enhancements.

As a regulatory disclaimer though I must mention that the information here may be subject to potential change in future releases beyond CTP2.

 

Current State

SQL Server 2014 introduced T-SQL grammar support allowing us to declare index definitions inline with the CREATE TABLE itself. While this was mainly driven by the semantics of the in-memory OLTP tables, the elegance of defining indexes along with the table is compelling. However there were some drawbacks:

  • Inline definition of columnstore indexes was not allowed.
  • Inline indexes could not have a filter predicate associated.

I’m glad to see that both of these have scenarios have been addressed in SQL Server 2016 CTP2. Here are some examples of the new syntax which work in SQL Server 2016 CTP2:

Inline Filtered Index

create table t1
(
    c1 int,
    index IX1 (c1) where c1 > 0
)
go

Inline Columnstore Index

— inline clustered columnstore index definition
create table t2
(
    c1 int,
    c2 int,
    c3 nvarchar(500),
    index CCSI1 clustered columnstore
)

— inline nonclustered columnstore index definition
create table t3
(
    c1 int,
    c2 int,
    c3 nvarchar(500),
    index NCCSI1 nonclustered columnstore (c1, c2)
)

The CREATE TABLE help in SQL 2016 CTP2 Books Online has been updated to reflect these enhancements.

I’ll be posting other such nuggets on an ongoing basis, so stay tuned!

SQL Server and ‘Instant File Initialization’ Under the Hood – Part 3

Welcome back! As promised last time around, here’s part 3 of the ‘Instant File Initialization’ (a.k.a. ‘IFI’) optimization for SQL Server series. If you missed the first two parts you should definitely take some time to read them first before resuming this one, because the previous posts cover a lot of things which would be assumed in this post:

With that background, this post will show you how IFI works (or rather – did not – in specific release!) in conjunction with the Buffer Pool Extension feature in SQL 2014 and above.

Buffer Pool Extension Overview

A few weeks ago, one of my colleagues asked this question internally: ‘does IFI (Instant File Initialization) have impact on creating the BPE?’. That question was the inspiration for this entire series of posts, so I thank him for that. In order to answer the question we first need to understand conceptually how the Buffer Pool Extension (BPE) feature works. The Books Online topic for BPE is a good starting point, but here is my summary:

  • Think of the BPE as offering a ‘Level 2’ cache over the primary ‘Level 1’ i.e. classic Buffer Pool. If the DB page cannot be found in either L1 cache or L2 cache only then will it spill over to regular physical read from the data file.
  • The BPE cache mechanism is actually based on a file which is preferably hosted in very fast storage, such as a SSD. For example, in an Azure D-Series VM context, the D: drive is an excellent place holder for the BPE and / or the TEMPDB – see this article from the SQL Server Product Team for some details.
  • The BPE file is created when you ALTER SERVER CONFIGURATION command to enable BPE, or on SQL startup (if BPE was already configured.)
  • The size of the BPE file can be up to a multiple of the ‘max server memory’ configured (the limit varies by SQL Edition) but we do not generally recommend more than 4x the max server memory setting. The reason I mentioning this here is to consider that the BPE file may be a very large file and depending on which buffer page we are saving into the BPE file, the offset of that file write operation may be quite large.
  • Finally, the BPE file is deleted on SQL Server shutdown (and hence re-created on startup.)

BPE Internals

As with other operations in SQL, the writes into the BPE are optimized using the WriteFileGather() API. And based on which buffer page was being written to the file, the offset into the BPE file itself can be quite large. If we run a Process Monitor trace during the BPE file operations, we will notice that in SQL 2014 RTM there are a number of Synchronous Paging I/O (the second highlighted line in the below snipping) following a regular write operation to the BPE (which in the below screenshot is the first call to WriteFile at a offset of 196771840:

httpsmsdnshared.blob.core.windows.netmediaMSDNBlogsFSprod.evol.blogs.msdn.comCommunityServer.Blogs.Components.WeblogFiles0000006464metablogapi7444.image_1E74B602.png

But as you learnt in the first two parts, writing into ‘random’ locations inside a file will cause the OS to silently ‘zero out’ the allocations from the previous valid data length to the new location, and indeed in the case of BPE writes as well, you will see the tell-tale signs of this:

httpsmsdnshared.blob.core.windows.netmediaMSDNBlogsFSprod.evol.blogs.msdn.comCommunityServer.Blogs.Components.WeblogFiles0000006464metablogapi3755.image_2BDAC908.png

Notice the calls to CcZeroDataOnDisk above, which represents the zero-stamping at a Windows level. Now this is synchronous and will cause the top-level WriteFileGather() to block till the allocations are zeroed up to the current data length. What this means is that the SQL task which caused the buffer fetch in the first place will be blocked a bit more than you would like.

Salvation!

From the above, it certainly looks like there is a potential gap / improvement possibility in SQL 2014 RTM because the calls to write into the BPE would be effectively synchronous and slow down operations. Thankfully, our development team has acted on the feedback from customers and has introduced a call to SetFileValidData() in SQL 2014 Service Pack 1. Specifically, this issue was the one fixed by using the SetFileValidData() API in the BPE initialization code!

So if you now capture a Process Monitor trace during BPE initialization in SQL 2014 SP1, here is what you will see:

httpsmsdnshared.blob.core.windows.netmediaMSDNBlogsFSprod.evol.blogs.msdn.comCommunityServer.Blogs.Components.WeblogFiles0000006464metablogapi2021.image_0E6877FC.png

Since the valid data length is being set proactively to the entire file size itself, initial writes to the BPE file at any random (high) offsets are no longer blocking due to underlying zeroing. This leads to a significant improvement for some customers.

Edit 28 May 2015: Now that SQL 2016 CTP2 is officially available, I’m glad to report that the above improvement is also present in SQL 2016 code base!

So with that, you now know one more way in which the IFI optimization is being used within SQL Server engine. There is one more place which we can talk about, but I’d like to challenge our readers to share their guesses on what that might be – please post your guesses as comments, and I will come back to you shortly with that information as well!

SQL Server and ‘Instant File Initialization’ Under the Hood – Part 2

This is part 2 of my series on ‘Instant File Initialization’ and how that ‘brand name’ actually works under the covers. This post will take a look at what really happens when a database file is created and how the ‘Instant File Initialization’ optimization really helps from a SQL Server perspective. Before you proceed, it is highly recommended that you read Part 1 of this series; so if you have missed the first part, I highly recommend you start there!

Before we begin, a big ‘thank you’ to Bob Dorr, who offered some valuable insight on this topic and also authored an excellent white paper on the overall SQL I/O topic. As well as a shout out for Bob Ward’s excellent ‘Inside SQL I/O’ talk at SQL PASS Conference 2014. Links to both of their works are at the end of this blog post.

In the Beginning…

Let’s start simple: anyone who has worked with SQL Server knows that if you specify a very large file size for the data file, it takes a while (at least with the default setup) to finish this. You also probably know that this is because of ‘zeroing out’ of the underlying allocations.

Now, the million dollar question: when a database is created, ‘conceptually’ there is nothing inside it – right? Smile So why would we need to do the zeroing at this time? Recall from Part 1 of the series, that the first WriteFile() call triggered off the underlying zeroing at an OS level. So, though the data file is basically ‘empty’, maybe SQL is writing into some random file locations and causing this?

Now, why would SQL Server write into ‘random’ places at DB creation?. The answer is that SQL still needs to perform some ‘metadata’ setup on the file or on the new grown section of the file. This ‘metadata’ is basically the internal allocation related pages namely GAM, SGAM and PFS pages, which are scattered at predictable intervals throughout the length of the file.

GAM / PFS Initialization

Now, if you are like me, you would want to verify or see this in the debugger, and indeed some quick poking around with WinDbg will reveal the intricacies of why we are doing this random I/O immediately after resizing or creating the file (and therefore why the zeroing of clusters will normally happen unless you enabled the conditions to use ‘instant file initialization’.)

Firstly, you can poke around in the debugger (note that I used only public symbols for the below walkthrough – you can get started with WinDbg and SQL Server here) and if you get a bit savvy with the debugger you can uncover things like the below:

0:111> x sqlmin!Init*Pages
00007ff8`da328f90 sqlmin!InitGAMIntervalPages (<no parameter info>)
00007ff8`da329190 sqlmin!InitDBAllocPages (<no parameter info>)
00007ff8`da3286a0 sqlmin!InitPFSPages (<no parameter info>)

If you set a few breakpoints you will see the action around PFS and GAM initialization (you will see a lot more PFS than GAM pages because the interval tracked by GAM pages are much larger than PFS). Here is a sample for PFS pages initialization:

sqlmin!InitPFSPages
sqlmin!InitDBAllocPages
sqlmin!FileMgr::CreateNewFile
sqlmin!AsynchronousDiskAction::ExecuteDeferredAction
sqlmin!AsynchronousDiskWorker::ThreadRoutine
sqlmin!SubprocEntrypoint
sqldk!SOS_Task::Param::Execute
sqldk!SOS_Scheduler::RunTask

Please keep this aspect in mind because we will revisit this later.

Case 1: Without ‘Instant File Initialization’

Now, imagine this: if SQL were to directly start writing to ‘random’ locations corresponding to the above GAM, PFS pages, then consider (and if you read Part 1 carefully) we would expect the corresponding WriteFile() operations to cause the OS to issue underlying CcZeroDataOnDisk calls to zero out. This would be inefficient, so in SQL what we do is to proactively issue 8MB chunked I/O writes to zero out the file. You can easily verify this if you run a filtered Process Monitor trace, which I did do and the same is summarized below:

image

If you dig a bit deeper, specifically use the Stack view inside of Process Monitor for one of the WriteFile() calls shown above, you can see all the details down to the WriteFileGather() routine which does the I/O in chunks of 8MB to zero out the file proactively:

image

Notice that there are no calls by the kernel to CcZeroDataOnDisk. So we are in a way doing what the OS did in the earlier case, perhaps a bit more aggressively due to the larger I/O sizes (8MB.)

Now you can imagine why it takes a long time to zero out a large file. If you attended Bob Ward’s excellent ‘Inside SQL I/O’ session at SQL PASS 2014 he actually does some calculations to show you how long it would take to zero out a large data file. For example, if you have a 10GB data file and you have 150MB/sec serial I/O throughput on the drive, you can estimate roughly 70 seconds to do the zero initialization. That can be a really long time, especially if you get an autogrow of that size!

Seed question: if you scroll through the ProcMon trace to the last of the 8MB WriteFile operations (which are the zeroing ones) then you will notice that there are some 8KB writes which follow. Why? The answer follows at the end of ‘Case 2’ walkthrough below!

image

Case 2: With ‘Instant File Initialization’

Now, assume that the SQL Service account has been allocated the SeManageVolumePrivilege (which allows the successful use of the SetFileValidData API I mentioned in the previous post) then SQL will attempt to use this ‘optimization’ to avoid the zeroing overhead. We captured a sample trace using Process Monitor while SQL was creating a 5GB data file. Here is a screenshot of how the Process Monitor logs look like with Instant File Initialization optimization enabled successfully:

image

You can see the reference to SetValidDataLengthInformationFile (highlighted) followed by a series of 8KB writes. In the debugger, you will see the following call stack which proves that we do indeed call the SetFileValidData() API from the FCB::InitializeSpace() call:

KERNELBASE!SetFileValidData
sqlmin!FCB::InitializeSpace
sqlmin!FileMgr::CreateNewFile
sqlmin!AsynchronousDiskAction::ExecuteDeferredAction
sqlmin!AsynchronousDiskWorker::ThreadRoutine

Now we answer the previous question we seeded at the end of the Case 1 section: why do we still get the 8KB writes? If you recall from the ‘GAM / PFS Initialization’ section previously then this should be crystal clear! Here is a call stack of one of the 8KB writes:

image

As you can see above, this is for a PFS page initialization. So this explains the 8KB writes after the file was created.

Case 3: Sparse File Creation (Database Snapshot)

Next, let’s look at one of the special cases: Database snapshots in SQL Server are implemented using NTFS ‘sparse file’ functionality. Now, in the case of a sparse file, we do not use either of the two mechanisms mentioned above, and instead use a special mechanism to do the ‘zero initialization’. Why? Read on!

If you read the ‘Instant File Initialization’ (IFI) section in the SQL I/O Basics Chapter 2 white paper, you will see this sentence:

The algorithm used by SQL Server is more aggressive than the NTFS zero initialization (DeviceIoControl, FSCTL_SET_ZERO_DATA)

From MSDN it is clear that there is an optimization to set a range in a sparse file as all zeros without physically extending the file size:

If you use the WriteFile function to write zeros (0) to a sparse file, the file system allocates disk space for the data that you are writing. If you use the FSCTL_SET_ZERO_DATA control code to write zeros (0) to a sparse file and the zero (0) region is large enough, the file system may not allocate disk space.

AHA! So I hope that explains why we cannot use the conventional ‘zero stamping’ or the SetFileValidData mechanism for sparse files. But let’s see this for ourselves! Let’s start by creating a DB snapshot, but before I executed the below I also put a breakpoint in WinDbg on kernelbase!DeviceIoControl().

— Create the database snapshot
CREATE DATABASE ZN_test ON
( NAME = ZN, FILENAME =
‘l:tempZN_test.ss’ )
AS SNAPSHOT OF ZN;
GO

Here is the corresponding Process Monitor trace:

image

From WinDbg we can get the call stack. You can see that FCB::ZeroFile() calls the DeviceIoControl in this case:

KERNELBASE!DeviceIoControl
KERNEL32!DeviceIoControlImplementation
sqlmin!FCB::ZeroFile
sqlmin!FCB::InitializeSpace
sqlmin!FileMgr::CreateNewFile

Wow! So I hope you get a feel for how many optimizations we have in place for SQL Server from an I/O perspective.

Case 4: Log File Initialization

Last but not the least, let us study the case for the transaction log file. Interestingly (and as is known and documented in many places) the log file is always zero-initialized. Here is a ProcMon trace (which was taken when IFI was already leveraged for the data file creation):

image

The above operations are largely related to zeroing out the entire file and then formatting the Virtual Log Files within the initial chunk. The log file (2MB in size) was zero-initialized in one shot in the above case. It took 30 milliseconds to do that on my system. Obviously more real world sizes would take proportionately more time to finish.

FYI – you can see the progress of the log fixups by using undocumented trace flag 3004.

What Next?

So that’s it, I hope you enjoyed this spelunking into the internals of the OS and SQL. Next up, we will see how this optimization applies (or does not apply) to other key components within SQL. For further reading, the following resources are excellent resources on the topic of SQL I/O internals:

Query of the day: finding SQL Server queries with large memory grants

Quick tip from me today: I recently had to check on which T-SQL query / queries in the system were using up some monster 30GB+ query grants. Luckily the sys.dm_exec_query_memory_grants DMV facilitates this. Here is the query I finally used to figure out what was happening:

SELECT r.session_id
    ,mg.granted_memory_kb
    ,mg.requested_memory_kb
    ,mg.ideal_memory_kb
    ,mg.request_time
    ,mg.grant_time
    ,mg.query_cost
    ,mg.dop
    ,(
        SELECT SUBSTRING(TEXT, statement_start_offset / 2 + 1, (
                    CASE
                        WHEN statement_end_offset = – 1
                            THEN LEN(CONVERT(NVARCHAR(MAX), TEXT)) * 2
                        ELSE statement_end_offset
                        END – statement_start_offset
                    ) / 2)
        FROM sys.dm_exec_sql_text(r.sql_handle)
        ) AS query_text
    ,qp.query_plan
FROM sys.dm_exec_query_memory_grants AS mg
INNER JOIN sys.dm_exec_requests r ON mg.session_id = r.session_id
CROSS APPLY sys.dm_exec_query_plan(r.plan_handle) AS qp
ORDER BY mg.required_memory_kb DESC;

In case you are wondering what memory grants are all about, you should start from this great blog post from Jay Choe, who is part of the SQL Server development team. Jay also posts some useful variants of queries to find out (for example) cached query plans with memory grants. It is a great read.

Happy query tuning!

SQL Server and ‘Instant File Initialization’ Under the Hood – Part 1

Recently a colleague of mine popped up a very interesting question around whether the SQL Server ‘Buffer Pool Extension’ feature in SQL 2014 uses the ‘instant file initialization’ optimization (or not). While answering that question I found some useful information which I believe will help many of us. So here we go… firstly, we need to understand what ‘instant file initialization’ is really all about, from the Windows perspective.

Background

At the OS level every file has three important attributes which are recorded in the metadata of the NTFS file system:

  • Physical file size
  • Allocation file size
  • Valid data size

In this post, we are mostly concerned with Physical and Valid Data sizes. More details are available at the this MSDN page but for simplicity, let me put it this way:

  • When you create a file with the CreateFile API, it starts with a 0 byte length
  • One way to ‘grow’ the file is of course to sequentially write bytes to it.
  • But if you want to ‘pre-size’ the file to a specific size, then you may not want to explicitly write data upfront.
  • In those cases the OS provides a SetEndOfFile() API to ‘resize’ the file, but as you will see below, there are still some things which will hold up the thread when the first write operation is done to the pre-sized file

Let’s work through this step-by-step. A bit of programming knowledge will help, though it should be fairly easy to figure out what’s going on by reading the comments inline in the code! Smile 

Growing a file: C++ example

Here is a simple program which will demonstrate how you can grow a file to 3GB without having to write individual bytes till the 3GB mark:

#include <Windows.h>

int _tmain(int argc, _TCHAR* argv[])
{
    // create a file first. it will start as an empty file of course
    HANDLE myFile = ::CreateFile(L"l:\temp\ifi.dat",
        GENERIC_WRITE,
        0,
        NULL,
        CREATE_ALWAYS,
        FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (INVALID_HANDLE_VALUE == myFile)
    {
        return -1;
    }

    // let’s now make the file 3GB in size
    LARGE_INTEGER newpos;
    newpos.QuadPart = (LONGLONG) 3 * 1024 * 1024 * 1024;

    LARGE_INTEGER newfp;

    // navigate to the new ‘end of the file’
    ::SetFilePointerEx(myFile,
        newpos,
        &newfp,
        FILE_BEGIN);

    // ‘seal’ the new EOF location
    ::SetEndOfFile(myFile);

    // now navigate to the EOF – 1024 bytes.
    newpos.QuadPart = (LONGLONG)3 * 1024 * 1024 * 1024 – 1024;
    ::SetFilePointerEx(myFile, newpos, &newfp, FILE_BEGIN);

    DWORD dwwritten = 0;

    // try to write 5 bytes to the 3GB-1024th location
    ::WriteFile(myFile,
        "hello",
        5,
        &dwwritten,
        NULL);

    return 0;
}

When we execute the above code, you will see that though we used the SetEndOfFile() API to locate the EOF marker without us explicitly writing anything, there is some work being done by the OS underneath our code to ‘zero’ out the contents of the clusters allocated to us. This is done for data privacy reasons and since it is physical I/O, it does take a while. You may want to refer the documentation for the SetFilePointerEx function:

Note that it is not an error to set the file pointer to a position beyond the end of the file. The size of the file does not increase until you call the SetEndOfFile, WriteFile, or WriteFileEx function. A write operation increases the size of the file to the file pointer position plus the size of the buffer written, leaving the intervening bytes uninitialized.

Snooping in with Process Monitor

You can actually look at the proof of what is happening underneath the hood by using Process Monitor from the Sysinternals suite. Here is a complete call stack of the application. Notice the call in the kernel to zero out data (CcZeroDataOnDisk). Notice that these are not our API calls. We simply called WriteFile() and that triggered off these underlying ‘zeroing’ writes.

image

In the same ProcMon trace you will also notice a bunch of I/O operations (corresponding to the above stack) just after I triggered my 5 bytes I/O:

image

The key takeaway from this walkthrough is that when we called SetEndOfFile(), we do not affect the ‘valid data length’ of that file stream. In that case, the OS will play it safe by zeroing out the allocations from the previous valid file length (which in our case above was actually 0) leading up to the location of the write (which in our case is 1024 bytes prior to the physical end of the file.) This operation is what causes the thread to block.

Growing a file – the ‘fast’ way

Instant File Initialization as we know it in SQL Server really reduces to an API call in Windows. To see that, we tweak the above sample and add in the ‘secret sauce’ which is the call to SetFileValidData() API:

// ‘seal’ the new EOF location
::SetEndOfFile(myFile);

// now ‘cleverly’ set the valid data length to 3GB
if (0 == ::SetFileValidData(myFile, newpos.QuadPart))
{
    printf("Unable to use IFI, error %d", GetLastError());
}
else
{
    printf("IFI was used!!!");
}

// now navigate to the EOF – 1024 bytes.
newpos.QuadPart = (LONGLONG)3 * 1024 * 1024 * 1024 – 1024;

You will then see that the same code executes almost instantly. The reason for this is because the OS will no longer need to zero any bytes underneath the hood, because the valid data length (as set by the above API call) == file size. This can be seen in Process Monitor as well:

image

Dangers of SetFileValidData()

The important thing to note is that SetFileValidData() is a dangerous API in a way, because it can potentially expose underlying fragments of data. Much has been said about this, and you can check out Raymond’s blog post on this topic. The MSDN page for this API is also very clear on the caveats:

You can use the SetFileValidData function to create large files in very specific circumstances so that the performance of subsequent file I/O can be better than other methods. Specifically, if the extended portion of the file is large and will be written to randomly, such as in a database type of application, the time it takes to extend and write to the file will be faster than using SetEndOfFile and writing randomly. In most other situations, there is usually no performance gain to using SetFileValidData, and sometimes there can be a performance penalty.

What next?

Of course, if you are like me, you are probably wondering what this all equates to. Remember, we are trying to explore some of the basis and background on the ‘instant file initialization’ optimization that SQL Server can leverage to quickly size new and grown chunks for data files. As the documentation and our team’s blog post explain in detail, this setting can be very useful in certain cases and is in fact recommended for deployments on Microsoft Azure IaaS VMs.

Next time, I will correlate this information we learnt above to how SQL Server leverages it in the process of creating new data files or growing existing ones. Till then, goodbye!