SQL Server and ‘Instant File Initialization’ Under the Hood – Part 1

Recently a colleague of mine popped up a very interesting question around whether the SQL Server ‘Buffer Pool Extension’ feature in SQL 2014 uses the ‘instant file initialization’ optimization (or not). While answering that question I found some useful information which I believe will help many of us. So here we go… firstly, we need to understand what ‘instant file initialization’ is really all about, from the Windows perspective.

Background

At the OS level every file has three important attributes which are recorded in the metadata of the NTFS file system:

Physical file size
Allocation file size
Valid data size

In this post, we are mostly concerned with Physical and Valid Data sizes. More details are available at the this MSDN page but for simplicity, let me put it this way:

When you create a file with the CreateFile API, it starts with a 0 byte length
One way to ‘grow’ the file is of course to sequentially write bytes to it.
But if you want to ‘pre-size’ the file to a specific size, then you may not want to explicitly write data upfront.
In those cases the OS provides a SetEndOfFile() API to ‘resize’ the file, but as you will see below, there are still some things which will hold up the thread when the first write operation is done to the pre-sized file

Let’s work through this step-by-step. A bit of programming knowledge will help, though it should be fairly easy to figure out what’s going on by reading the comments inline in the code! Smile

Growing a file: C++ example

Here is a simple program which will demonstrate how you can grow a file to 3GB without having to write individual bytes till the 3GB mark:

#include <Windows.h>

int _tmain(int argc, _TCHAR* argv[])
{
    // create a file first. it will start as an empty file of course
    HANDLE myFile = ::CreateFile(L"l:\temp\ifi.dat",
        GENERIC_WRITE,
        0,
        NULL,
        CREATE_ALWAYS,
        FILE_ATTRIBUTE_NORMAL,
        NULL);

    if (INVALID_HANDLE_VALUE == myFile)
    {
        return -1;
    }

    // let’s now make the file 3GB in size
    LARGE_INTEGER newpos;
    newpos.QuadPart = (LONGLONG) 3 * 1024 * 1024 * 1024;

    LARGE_INTEGER newfp;

    // navigate to the new ‘end of the file’
    ::SetFilePointerEx(myFile,
        newpos,
        &newfp,
        FILE_BEGIN);

    // ‘seal’ the new EOF location
    ::SetEndOfFile(myFile);

    // now navigate to the EOF – 1024 bytes.
    newpos.QuadPart = (LONGLONG)3 * 1024 * 1024 * 1024 – 1024;
    ::SetFilePointerEx(myFile, newpos, &newfp, FILE_BEGIN);

    DWORD dwwritten = 0;

    // try to write 5 bytes to the 3GB-1024th location
    ::WriteFile(myFile,
        "hello",
        5,
        &dwwritten,
        NULL);

    return 0;
}

When we execute the above code, you will see that though we used the SetEndOfFile() API to locate the EOF marker without us explicitly writing anything, there is some work being done by the OS underneath our code to ‘zero’ out the contents of the clusters allocated to us. This is done for data privacy reasons and since it is physical I/O, it does take a while. You may want to refer the documentation for the SetFilePointerEx function:

Note that it is not an error to set the file pointer to a position beyond the end of the file. The size of the file does not increase until you call the SetEndOfFile, WriteFile, or WriteFileEx function. A write operation increases the size of the file to the file pointer position plus the size of the buffer written, leaving the intervening bytes uninitialized.

Snooping in with Process Monitor

You can actually look at the proof of what is happening underneath the hood by using Process Monitor from the Sysinternals suite. Here is a complete call stack of the application. Notice the call in the kernel to zero out data (CcZeroDataOnDisk). Notice that these are not our API calls. We simply called WriteFile() and that triggered off these underlying ‘zeroing’ writes.

In the same ProcMon trace you will also notice a bunch of I/O operations (corresponding to the above stack) just after I triggered my 5 bytes I/O:

The key takeaway from this walkthrough is that when we called SetEndOfFile(), we do not affect the ‘valid data length’ of that file stream. In that case, the OS will play it safe by zeroing out the allocations from the previous valid file length (which in our case above was actually 0) leading up to the location of the write (which in our case is 1024 bytes prior to the physical end of the file.) This operation is what causes the thread to block.

Growing a file – the ‘fast’ way

Instant File Initialization as we know it in SQL Server really reduces to an API call in Windows. To see that, we tweak the above sample and add in the ‘secret sauce’ which is the call to SetFileValidData() API:

// ‘seal’ the new EOF location
::SetEndOfFile(myFile);

// now ‘cleverly’ set the valid data length to 3GB
if (0 == ::SetFileValidData(myFile, newpos.QuadPart))
{
printf("Unable to use IFI, error %d", GetLastError());
}
else
{
printf("IFI was used!!!");
}

// now navigate to the EOF – 1024 bytes.
newpos.QuadPart = (LONGLONG)3 * 1024 * 1024 * 1024 – 1024;

You will then see that the same code executes almost instantly. The reason for this is because the OS will no longer need to zero any bytes underneath the hood, because the valid data length (as set by the above API call) == file size. This can be seen in Process Monitor as well:

Dangers of SetFileValidData()

The important thing to note is that SetFileValidData() is a dangerous API in a way, because it can potentially expose underlying fragments of data. Much has been said about this, and you can check out Raymond’s blog post on this topic. The MSDN page for this API is also very clear on the caveats:

You can use the SetFileValidData function to create large files in very specific circumstances so that the performance of subsequent file I/O can be better than other methods. Specifically, if the extended portion of the file is large and will be written to randomly, such as in a database type of application, the time it takes to extend and write to the file will be faster than using SetEndOfFile and writing randomly. In most other situations, there is usually no performance gain to using SetFileValidData, and sometimes there can be a performance penalty.

What next?

Of course, if you are like me, you are probably wondering what this all equates to. Remember, we are trying to explore some of the basis and background on the ‘instant file initialization’ optimization that SQL Server can leverage to quickly size new and grown chunks for data files. As the documentation and our team’s blog post explain in detail, this setting can be very useful in certain cases and is in fact recommended for deployments on Microsoft Azure IaaS VMs.

Next time, I will correlate this information we learnt above to how SQL Server leverages it in the process of creating new data files or growing existing ones. Till then, goodbye!

Arvind Shyamsundar's technical blog

Arvind Shyamsundar is a Principal PM @ MSFT Azure Data, working on Azure SQL. Data geek. Apache Accumulo and Fluo PMC. SQL MCM, ex-Principal PFE (MSFT Services). These are my own opinions and not those of Microsoft.

SQL Server and ‘Instant File Initialization’ Under the Hood – Part 1

Background

Growing a file: C++ example

Snooping in with Process Monitor

Growing a file – the ‘fast’ way

Dangers of SetFileValidData()

What next?

Leave a comment Cancel reply

Background

Growing a file: C++ example

Snooping in with Process Monitor

Growing a file – the ‘fast’ way

Dangers of SetFileValidData()

What next?

Share this:

Related

Leave a comment Cancel reply