This is part 2 of my series on ‘Instant File Initialization’ and how that ‘brand name’ actually works under the covers. This post will take a look at what really happens when a database file is created and how the ‘Instant File Initialization’ optimization really helps from a SQL Server perspective. Before you proceed, it is highly recommended that you read Part 1 of this series; so if you have missed the first part, I highly recommend you start there!
Before we begin, a big ‘thank you’ to Bob Dorr, who offered some valuable insight on this topic and also authored an excellent white paper on the overall SQL I/O topic. As well as a shout out for Bob Ward’s excellent ‘Inside SQL I/O’ talk at SQL PASS Conference 2014. Links to both of their works are at the end of this blog post.
In the Beginning…
Let’s start simple: anyone who has worked with SQL Server knows that if you specify a very large file size for the data file, it takes a while (at least with the default setup) to finish this. You also probably know that this is because of ‘zeroing out’ of the underlying allocations.
Now, the million dollar question: when a database is created, ‘conceptually’ there is nothing inside it – right? So why would we need to do the zeroing at this time? Recall from Part 1 of the series, that the first WriteFile() call triggered off the underlying zeroing at an OS level. So, though the data file is basically ‘empty’, maybe SQL is writing into some random file locations and causing this?
Now, why would SQL Server write into ‘random’ places at DB creation?. The answer is that SQL still needs to perform some ‘metadata’ setup on the file or on the new grown section of the file. This ‘metadata’ is basically the internal allocation related pages namely GAM, SGAM and PFS pages, which are scattered at predictable intervals throughout the length of the file.
GAM / PFS Initialization
Now, if you are like me, you would want to verify or see this in the debugger, and indeed some quick poking around with WinDbg will reveal the intricacies of why we are doing this random I/O immediately after resizing or creating the file (and therefore why the zeroing of clusters will normally happen unless you enabled the conditions to use ‘instant file initialization’.)
Firstly, you can poke around in the debugger (note that I used only public symbols for the below walkthrough – you can get started with WinDbg and SQL Server here) and if you get a bit savvy with the debugger you can uncover things like the below:
0:111> x sqlmin!Init*Pages
00007ff8`da328f90 sqlmin!InitGAMIntervalPages (<no parameter info>)
00007ff8`da329190 sqlmin!InitDBAllocPages (<no parameter info>)
00007ff8`da3286a0 sqlmin!InitPFSPages (<no parameter info>)
If you set a few breakpoints you will see the action around PFS and GAM initialization (you will see a lot more PFS than GAM pages because the interval tracked by GAM pages are much larger than PFS). Here is a sample for PFS pages initialization:
Please keep this aspect in mind because we will revisit this later.
Case 1: Without ‘Instant File Initialization’
Now, imagine this: if SQL were to directly start writing to ‘random’ locations corresponding to the above GAM, PFS pages, then consider (and if you read Part 1 carefully) we would expect the corresponding WriteFile() operations to cause the OS to issue underlying CcZeroDataOnDisk calls to zero out. This would be inefficient, so in SQL what we do is to proactively issue 8MB chunked I/O writes to zero out the file. You can easily verify this if you run a filtered Process Monitor trace, which I did do and the same is summarized below:
If you dig a bit deeper, specifically use the Stack view inside of Process Monitor for one of the WriteFile() calls shown above, you can see all the details down to the WriteFileGather() routine which does the I/O in chunks of 8MB to zero out the file proactively:
Notice that there are no calls by the kernel to CcZeroDataOnDisk. So we are in a way doing what the OS did in the earlier case, perhaps a bit more aggressively due to the larger I/O sizes (8MB.)
Now you can imagine why it takes a long time to zero out a large file. If you attended Bob Ward’s excellent ‘Inside SQL I/O’ session at SQL PASS 2014 he actually does some calculations to show you how long it would take to zero out a large data file. For example, if you have a 10GB data file and you have 150MB/sec serial I/O throughput on the drive, you can estimate roughly 70 seconds to do the zero initialization. That can be a really long time, especially if you get an autogrow of that size!
Seed question: if you scroll through the ProcMon trace to the last of the 8MB WriteFile operations (which are the zeroing ones) then you will notice that there are some 8KB writes which follow. Why? The answer follows at the end of ‘Case 2’ walkthrough below!
Case 2: With ‘Instant File Initialization’
Now, assume that the SQL Service account has been allocated the SeManageVolumePrivilege (which allows the successful use of the SetFileValidData API I mentioned in the previous post) then SQL will attempt to use this ‘optimization’ to avoid the zeroing overhead. We captured a sample trace using Process Monitor while SQL was creating a 5GB data file. Here is a screenshot of how the Process Monitor logs look like with Instant File Initialization optimization enabled successfully:
You can see the reference to SetValidDataLengthInformationFile (highlighted) followed by a series of 8KB writes. In the debugger, you will see the following call stack which proves that we do indeed call the SetFileValidData() API from the FCB::InitializeSpace() call:
Now we answer the previous question we seeded at the end of the Case 1 section: why do we still get the 8KB writes? If you recall from the ‘GAM / PFS Initialization’ section previously then this should be crystal clear! Here is a call stack of one of the 8KB writes:
As you can see above, this is for a PFS page initialization. So this explains the 8KB writes after the file was created.
Case 3: Sparse File Creation (Database Snapshot)
Next, let’s look at one of the special cases: Database snapshots in SQL Server are implemented using NTFS ‘sparse file’ functionality. Now, in the case of a sparse file, we do not use either of the two mechanisms mentioned above, and instead use a special mechanism to do the ‘zero initialization’. Why? Read on!
If you read the ‘Instant File Initialization’ (IFI) section in the SQL I/O Basics Chapter 2 white paper, you will see this sentence:
The algorithm used by SQL Server is more aggressive than the NTFS zero initialization (DeviceIoControl, FSCTL_SET_ZERO_DATA)
From MSDN it is clear that there is an optimization to set a range in a sparse file as all zeros without physically extending the file size:
If you use the WriteFile function to write zeros (0) to a sparse file, the file system allocates disk space for the data that you are writing. If you use the FSCTL_SET_ZERO_DATA control code to write zeros (0) to a sparse file and the zero (0) region is large enough, the file system may not allocate disk space.
AHA! So I hope that explains why we cannot use the conventional ‘zero stamping’ or the SetFileValidData mechanism for sparse files. But let’s see this for ourselves! Let’s start by creating a DB snapshot, but before I executed the below I also put a breakpoint in WinDbg on kernelbase!DeviceIoControl().
— Create the database snapshot
CREATE DATABASE ZN_test ON
( NAME = ZN, FILENAME =
AS SNAPSHOT OF ZN;
Here is the corresponding Process Monitor trace:
From WinDbg we can get the call stack. You can see that FCB::ZeroFile() calls the DeviceIoControl in this case:
Wow! So I hope you get a feel for how many optimizations we have in place for SQL Server from an I/O perspective.
Case 4: Log File Initialization
Last but not the least, let us study the case for the transaction log file. Interestingly (and as is known and documented in many places) the log file is always zero-initialized. Here is a ProcMon trace (which was taken when IFI was already leveraged for the data file creation):
The above operations are largely related to zeroing out the entire file and then formatting the Virtual Log Files within the initial chunk. The log file (2MB in size) was zero-initialized in one shot in the above case. It took 30 milliseconds to do that on my system. Obviously more real world sizes would take proportionately more time to finish.
FYI – you can see the progress of the log fixups by using undocumented trace flag 3004.
So that’s it, I hope you enjoyed this spelunking into the internals of the OS and SQL. Next up, we will see how this optimization applies (or does not apply) to other key components within SQL. For further reading, the following resources are excellent resources on the topic of SQL I/O internals: