Getting TaskUnzip to work with SQL Server Integration Services (SSIS) 2012

Background

It is a very common requirement in SSIS packages to extract contents of a compressed ZIP file, and then process that file in a data flow. From what I can see, it appears that TaskUnzip is quite a popular way to achieve this. So as part of the BI Aviation project I decided to use this for extracting the data from the ZIP files available in those datasets.

Using TaskUnzip

In my testing I used build 1.3.0.1 available for SQL 2008 (SSIS_TaskUnZip_1.3.0.1_SQL_2008.zip). I suspected it may not work in SQL 2012 as-is, but decided to give it a try. The setup instructions are included in the ZIP file, so I ran the Install_SQL_2008.cmd to install the library. That failed, because the references are to SQL 2008. So here is the corrected installation script:

net stop "SQL Server Integration Services 11.0"
gacutil.exe /u "ICSharpCode.SharpZipLib"
gacutil.exe /u TaskUnZip
copy "ICSharpCode.SharpZipLib.dll" "%ProgramFiles%Microsoft SQL Server110DTSTasks" /Y
copy "TaskUnZip.*" "%ProgramFiles%Microsoft SQL Server110DTSTasks" /Y
gacutil.exe /i "%ProgramFiles%Microsoft SQL Server110DTSTasksICSharpCode.SharpZipLib.dll"
gacutil.exe /i "%ProgramFiles%Microsoft SQL Server110DTSTasksTaskUnZip.dll"
net start "SQL Server Integration Services 11.0"
pause

The fixes are to change the references of 10.0 to 11.0 and 100 to 110, to suit the SQL 2012 version. You have to do similar corrections in the x86 version of the CMD script, because designers like the VS2010 based SSIS package editor are still 32-bit:

net stop "SQL Server Integration Services 11.0"
gacutil.exe /u "ICSharpCode.SharpZipLib"
gacutil.exe /u TaskUnZip
copy "ICSharpCode.SharpZipLib.dll" "%ProgramFiles(x86)%Microsoft SQL Server110DTSTasks" /Y
copy "TaskUnZip.*" "%ProgramFiles(x86)%Microsoft SQL Server110DTSTasks" /Y
gacutil.exe /i "%ProgramFiles(x86)%Microsoft SQL Server110DTSTasksICSharpCode.SharpZipLib.dll"
gacutil.exe /i "%ProgramFiles(x86)%Microsoft SQL Server110DTSTasksTaskUnZip.dll"
net start "SQL Server Integration Services 11.0"
pause

At this stage, you would expect the TaskUnzip task should show up in the SSIS Toolbox, but if you have a fresh installation of SQL 2012, it will not show up!

Rebuilding TaskUnzip for SQL 2012

Start by downloading the sources (I used this link and clicked the ‘download’ link on the top) and opening it in Visual Studio 2010 SP1. When you rebuild this project, you get the following errors:

The type or namespace name ‘Dts’ does not exist in the namespace ‘Microsoft.SqlServer’ (are you missing an assembly reference?)
The type or namespace name ‘Task’ could not be found (are you missing a using directive or an assembly reference?)

(errors truncated for brevity). This is due to the namespace changes in DTS in SQL 2012. You will also find that the references to the older DLLs are broken:

image

We also need to change the .NET Framework version to 4.0 runtime (read more about this here):

image

Then remove the old references and add Microsoft.SqlServer.Dts.Design and Microsoft.SQLServer.ManagedDTS (though the namespace names are the same, the DLLs they reference are different, hence you need to add them again.)

After this is done, the DLL build should succeed.

Deploying the recompiled DLL

After this is done, if you uninstall and reinstall the newer DLL you might get the following error:

C:2008>gacutil.exe /i "C:Program FilesMicrosoft SQL Server110DTSTasksTaskUnZip.dll"
Microsoft (R) .NET Global Assembly Cache Utility.  Version 3.5.21022.8
Copyright (c) Microsoft Corporation.  All rights reserved.

Failure adding assembly to the cache:   This assembly is built by a runtime newer than the currently loaded runtime and cannot be loaded.

This is because the gacutil.exe which ships with the TaskUnzip library ZIP file is actually for the framework version 3.5. To fix this you need a gacutil for v4.0. You can find an updated gacutil at “c:Program Files (x86)Microsoft SDKsWindowsv7.0ABinNETFX 4.0 Tools” (this path is on your VS.NET box). Copy these to your deployment server and overwrite the version which shipped with the TaskUnzip version. More details on this step are here.

Conclusion

That’s it! By recompiling the DLL you can add it to the toolbox and then into your Control Flows in SSIS. We will be using this component to import our BI Aviation data, so stay tuned for more details on how that is done!

And in closing, please take a minute to leave your questions or comments in the space provided at the end of this post. I do appreciate your feedback!


Disclaimer: the mention of 3rd party or community products in these blog posts in no way constitutes a recommendation or advice. These are my personal opinion and do not reflect any of my employer’s opinions.

Aviation BI Project – Part 1: Requirements

Background

One of my hobbies is following the commercial aviation scene. My job requires me to fly around a lot, and that only sparks my curiosity further around how the commercial aviation business works. What better then, that to use this hobby as a motivation to test new features in SQL Server 2012? This hybrid interest, which I call Aviation BI, is the driver behind a series of blog posts, where I will take you through a typical lifecycle of a BI project:

  • Defining requirements
  • Solution design and technology selection
  • Cleansing and importing data into the data warehouse
  • Querying the data warehouse
  • Using a multidimensional / tabular model to obtain slice / dice insights
  • Visualizing those insights in eye-catching fashion

In the first blog post of this series, I will take you through what I (as an end user) see as possible requirements. We will also review the datasets available for consumption.

Requirements

There are some categories of requirements that we have, depending on the perspective we are talking about. A typical retail consumer would want to check some statistics:

  1. Given a particular sector, which airlines serve that sector?
  2. How is the on-time performance of each airline on the sector?
  3. Which airline has been providing the cheapest tickets on the sector?
  4. What is the average arrival delay for a particular airport? Is there a pattern based on time of day?
  5. What is the average load factor (how full is the aircraft) on the sector?

As an airline executive, I might want to look at:

  1. Which are the routes with the highest traffic?
  2. Which aircraft am I routinely having service difficulties with?

As an airport administrator, I might want to look at:

  1. For my airport, is the ramp time (the time taken to taxi from gate to runway) very high compared to other airports?
  2. What are the most common locations from which international passengers arrive to my airport?
  3. Which is the busiest time for arrivals and departures for my airport?

As an aviation analyst I may have the following queries:

  1. Which type of aircraft is used to make trans-Atlantic journeys?
  2. Which operators are still using DC-10 aircraft (DC-10s were largely phased out by major operators in the late 1990s)
  3. Did the Antonov 225 make any appearances in the US recently?
  4. What are the longest non-stop flights to the USA?

And so on… these perspectives are imaginary, but I suppose they are very typical of what people in those roles would be asking to see.

Data sets available

There are some amazing datasets available in the public domain. For our purposes, we will be using the following:

Data set Data Points Row count
BTS On-time Performance Origin, Destination airports
Departure delay
Arrival delay
Taxi times
Flight distance
Flight time

Data available: 1987 onwards
Granularity: flight

146,090,545
Airline Origin and Destination Survey (DB1B) Market Number of coupons for this market1
Origin, destination airports
Ticketing and operating carrier         
Market Fare         
Number of passengers
         
Data available: 1993 onwards
Granularity: individual ‘market’
346,513,372
Airline Origin and Destination Survey (DB1B) Ticket Number of coupons in itinerary
Origin airport
Round trip indicator
Miles flown
Ticket fare per person
Reporting carrier

Data available: 1993 onwards
Granularity: individual ticket

190,783,532
Airline Origin and Destination Survey (DB1B) Coupon Origin, destination airports
Number of passengers included
Fare class

Data available: 1993 onwards
Granularity: individual coupon

584,874,480
T-100 Segment Unique carrier code
Origin, destination airport
Aircraft type
Departures scheduled and performed
Load factor
Seats vs. passengers
Freight and mail
Distance
Ramp to ramp time
Air time

Data available: 1990 onwards
Granularity: aggregated flights (monthly)

7,347,563
T-100 Market 2 Unique carrier code
Origin, destination airport
Service class
Domestic / International indicator
Passengers
Freight and mail
Distance

Data available: 1990 onwards
Granularity:

5,994,306
FAA Service Difficulty Reporting (SDR) Data available: 1994 onwards
Granularity: each service difficulty report
 

1 A ‘market’ in DB1B terminology is a break in journey other than for changing planes.

2 The difference between ‘market’ and ‘segment’ in T-100 is explained at this link. The T-100 reporting guide is also useful to understand this dataset.

   

Next Steps

Over a series of blog posts, I will take you through how we:

  1. Cleanse and import this data into a data warehouse.
  2. Build some analysis services databases on top of the relational data. We will explore both multidimensional and tabular mode databases for this step and see where each fits in.
  3. We will then consume this data – through Reporting Services, Excel and PowerView clients.
  4. We will finally do an incremental loading of ‘new’ data into the warehouse, and follow the incremental processing of the Analysis Services databases and confirm if the reporting pieces refresh the new data.

All of this, towards answering the ‘customer’ requirements which we laid out earlier!

References

For those interested in reading further about airline statistics, here are some references:

So if you have some comments on the scope of this ‘project’, I would LOVE to hear from you! Do leave a comment below if you liked this idea and would like to see more.

Where can I download SQL 2012 Upgrade Advisor?

This post is a really quick one… Let’s say you are planning to check your SQL Server and T-SQL code for any potential breaking changes in SQL 2012. So you use Upgrade Advisor. Let’s say you want to download it, rather than get it from the installation media (the MSI can be found under ServersredistUpgrade Advisor path.)

If you monkey around with your favorite search engine (mine is Bing!) you may not find a download link which directly says ‘SQL 2012 Upgrade Advisor’. That is because the link to download SQL UA is actually under the SQL 2012 Feature Pack. Once you get to that page, look for SQLUA.MSI. Select the one as per the architecture of the machine where you are going to execute the tool (x86 or x64.) 

You may also need to install the Windows Installer 4.5 and the .NET Framework 4.0 if they are not already installed. Then install the SQLUA.MSI file… and you are done!

Virtual PC / Virtual Server 2005 to Hyper-V: VM Additions issues

Background

I have some old VPCs which I used to run on Virtual PC 2007 SP1. In Virtual PC, we used ‘VM Additions’ which would install some drivers to make the VPC run smoothly in the virtualized environment. For example these additions would enable smooth usage of the mouse across host and guest. There are multiple versions of these additions, some dating back to the old Virtual Server 2005 days.

Problem

Now, I have Windows 8, which brings along with it Hyper-V. So when I moved my (rather old) Windows 2003 VPC image to the Hyper-V platform, and I tried to use my mouse to click within the guest OS, this is what I got:

Mouse not captured in Remote Desktop session

Generally, this is due to the lack of the right type of additions for the virtualization platform. Now, Hyper-V has its own Integration Services which perform a role similar to that played by the previous VPC additions (which used to be installed from VMadditions.ISO). So when I moved my old VPC’s VHD file and created a VM under Hyper-V, I expected that a simple ‘Install Integration Services’ click would suffice. Normally, this works for VHDs previously used inside of Virtual PC, but in my case, when I did this, it failed with the following ‘Virtual machine additions detected’ error:

Virtual Machine Additions Detected

OK, so I had to uninstall the Virtual PC additions first. Let’s try to Add/Remove them. But this also failed with the following cryptic error ‘Tahoma8 You can install Virtual Machine Additions only on a virtual machine that is running a supported guest operating system’. Well, at that time, Windows 2003 was indeed supported so I wonder why this error is reported when I try to uninstall Smile

image

The existing VM additions version I have on the guest is 13.552:

Virtual Machine Additions version

Ideal solution

The ideal case is, you still have Virtual PC somewhere, using which you can uninstall the VM additions and then move the VHD back to Hyper-V. But in my case, I no longer have Virtual PC anywhere on my computers.

Workaround

After a bit of digging around, I found the workaround. You can use this at your OWN RISK. It worked for me, but please do not blame me if it does not work for you Smile

Locating the MSI

First, we will locate the MSI installer which is used to install the VM Additions. To do this, firstly you have to be pretty good with your keyboard skills Smile 

  • First, put the guest into ‘Full Screen’ mode. This will allow you to use the Windows keyboard shortcuts on the guest.
  • Start up Windows Explorer on the guest (Windows key + E)
  • Do a Alt-D to go to the location bar. Type in “C:windowsinstaller” (you cannot see this folder normally)

Now, there are 2 ways to get to know which MSI is the VM additions MSI. Here is the first way:

  • In the list that you get, look at each MSI file and check its properties. To do that use the properties key (right side of the spacebar normally)
  • Use Tab and shift-Tab to navigate to the Summary tab of the properties window. Look at the ‘Subject’ field.
  • Quick tip: the VM additions MSI file for 13.552 is around 821KB in the Installer folder.

Looking at Summary tab in MSI properties

The second way is to use the Details screen, and use the ‘Choose Details’ Windows Explorer option to select the Subject field.

Customizing your Explorer view of MSI

Once you press enter, you can choose the Subject column. Make sure you use the ‘Move Up’ button to move this Subject next to the Name.

Customizing your Explorer view of MSI

Next, use your top notch keyboard skills Smile, use the Tab key, navigate to the details window and locate the file which has the subject of Virtual Machine Additions:

Viewing the MSI product name

Obtaining the ORCA tool

Next, we will edit that MSI file to remove the validations used to check for valid guest OS. To do this, we will use the ORCA utility which ships with the Platform SDK. Please see this KB article for more background. In my case I used the Windows 2003 Platform SDK for simplicity.

(note: you can also consider third party MSI editor like InstEd to edit the MSI).

Platform SDK installation to get the ORCA tool

 

  • Let’s say you installed the Platform SDK 2003 to a standalone folder like C:win2k3sdkBin
  • You will find the Orca.MSI file there
  • Create an ISO image containing this ORCA MSI file. You can use any freely available ISO image creator to do this.
Editing the MSI
  • Mount the ISO image on the guest OS.
  • Then install the Orca tool inside the guest OS.
  • Then load the VM Addition MSI file (which we identified previously) into the ORCA tool.
  • Using your top-notch keyboard skills, locate the CustomActions Node, and tab to the right pane. There locate the custom action called ‘Error_CheckForRunning…’. Delete it.

Using ORCA to delete the failing custom actions

  • Locate the other CustomAction called ‘CA_CheckForRunningIns…’. Delete it as well.
  • Similarly, locate the InstallExecuteSequence on the left pane, and delete the 2 entries there as well: ‘CA_CheckForRunningIns’ and ‘Error_CheckForRunning’.
  • Save the MSI (Ctrl-S).
Removing the old VM additions

Finally, we will replace that MSI file and then uninstall the old VM Additions.

Removing the VM additions

Finally… success!

Success! VM Additions removed!

 

Once this step is done (and you follow it with a reboot) you are ready to install the Hyper-V integration services!

I hope you found this workaround useful – please leave a comment if you found it saved you!

A curious case: CLR/COM Interop leak

 

Background

A customer asked me a question last week: in CLR-COM interop case, who is responsible to free up a string returned from COM? My understanding was that the interop layer setup by .NET will automatically take care of this. However, I thought a mock test would not do any harm. To my surprise this test actually opened up the proverbial ‘can of worms’ and I thought I would share my experience with you!

Prerequisite reading

There was a lot of detailed debugging stuff we will cover here, and it assumes some basics. For those who are not aware of the basics, I recommend you first consume the following.

 

Test setup: COM Server

We have an in-process ATL COM server which exposes an interface and a single method, SayHello. Here is the IDL interface for this method:

interface Imyclass : IDispatch{
    [id(1)] HRESULT SayHello([out] BSTR* hellostr);
};

And here is the implementation:

STDMETHODIMP Cmyclass::SayHello(BSTR* hellostr)
{
   *hellostr = SysAllocString(L"Hello World");

   return S_OK;
}

As you can see, this is no rocket science, we are allocating a BSTR inside the component and returning it to the caller. Therefore as per COM conventions the caller should free this up.

Test setup: C# client

We have a managed client written as a C# console application. A reference is added to the COM library, cominteropLib and that allows us to create instances of myclass:

static void Main(string[] args)
        {
            Console.ReadKey();

            string sout = "";
            myclass cls = new myclass();

            int i = 0;
            while (true)
            {
                cls.SayHello(ref sout);

                if (i % 100000 == 0)
                {
                    System.Threading.Thread.Sleep(1000);
                    Console.Write(sout);
                }               

             i++;
            }

Observations

So when we run the C# application, and plot Virtual, Private Bytes and also # Managed bytes for this process, here is the observed trend:

image

As you can see, the Private Bytes increases steadily, but the amount of managed bytes recorded is not even registering (almost 0 at this scale of the graph.) Generally this indicates the leak is in unmanaged allocations, as you can see in Using Perfmon to Diagnose Application Issues – 02.

    

Tracking the unmanaged leak

The best way to track down unmanaged leaks is to use DebugDiag 1.2. We did that and ran the Memory Analysis (native code only) to find out the call stacks with the most outstanding unmanaged allocations. Here are the relevant sections from the DebugDiag report:

 

mscorlib.ni.dll is responsible for 73.24 MBytes worth of outstanding allocations. The following are the top 2 memory consuming functions: mscorlib_ni+2c70aa: 73.24 MBytes worth of outstanding allocations.

 

Memory manager statistics by allocation size
OLE automation BSTR memory manager :  73.24 MBytes
Heap memory manager : 622.98 KBytes

So from this, it is clear that we are somehow leaking BSTRs. Let us save the call stack of the leaked allocation, and come back to it later (only the top few frames are reproduced for brevity.)

Call stack sample 1

Address   0x00650048
Allocation Time   00:00:10 since tracking started
Allocation Size   24 Bytes

Function   Source   Destination
mscorlib_ni+2c70aa      
clr+5d80      
mscorlib_ni+2c70aa      
clr+5d80      
mscorlib_ni+24bc9d      
clr+2ec3      0xAC07C8
clr+21db      
clr!LogHelp_TerminateOnAssert+301ec      
clr!LogHelp_TerminateOnAssert+74a40      
0x1DC030      
ConsoleApplication1+281e      
ConsoleApplication1+281e      
ConsoleApplication1+281d      
ConsoleApplication1+281b      

We will come back to this later.

Manually tracking the leak

Normally, we will just use the above call stack and drill down into the root cause. But I wanted to show you some other debugging methods which can be very useful. First, we will check if SysFreeString() is being called at all. Next, we will also dump the address of the BSTR allocated by SysAllocString() inside the COM component. We will compare these and check if we do release the BSTR which the COM component is allocating.

Checking SysAllocString / SysFreeString pairs

We will use conditional breakpoints in WinDbg to get this debug output. Here are the breakpoints:

  • bp `cominterop!myclass.cpp:21` ".echo ‘SysAllocString’; dd poi(hellostr) L1;g"

  • bp OLEAUT32!SysFreeString+0x7 ".echo ‘SysFreeString’; dd ebp+8 L1; g"

A quick explanation of these breakpoints is in order. The first one is a normal location breakpoint set on the line number 21 of myclass.cpp. We can easily do this because we have the source code of the COM component. In the case of the second (SysFreeString) breakpoint, we are going as per the disassembly knowledge, and guessing that [ebp+8] is actually the input parameter. Here is the disassembly of oleaut32!SysFreeString for your reference:

0:004> u oleaut32!SysFreeString
OLEAUT32!SysFreeString:
76333e40 8bff            mov     edi,edi
76333e42 55              push    ebp
76333e43 8bec            mov     ebp,esp
76333e45 51              push    ecx
76333e46 56              push    esi
76333e47 8b7508          mov     esi,dword ptr [ebp+8]    <— This is the second breakpoint. We will dump [ebp+8] here.
76333e4a 85f6            test    esi,esi
76333e4c 743d            je      OLEAUT32!SysFreeString+0x6c (76333e8b)

Both breakpoints contain commands to be executed when the breakpoint is hit. For the breakpoint in myclass, we will dump the address of the BSTR being returned by SysAllocString. For the SysFreeString we will dump the contents of location ebp+8 (so that we get the BSTR address being passed in.)

Here is the sample output from these breakpoints (snipped and reformatted to suit this blog post)

‘SysAllocString’ 0029f090  0045705c
‘SysFreeString’ 0029efdc  0045705c
‘SysAllocString’ 0029f090  00457094
‘SysFreeString’ 0029efdc  00457094
‘SysAllocString’ 0029f090  004570cc
‘SysFreeString’ 0029efdc  004570cc
‘SysAllocString’ 0029f090  00457104
‘SysFreeString’ 0029efdc  00457104
‘SysAllocString’ 0029f090  0045713c
‘SysFreeString’ 0029efdc  0045713c

As you can see from the above, we are very clearly freeing up the BSTR which is being returned by the COM component (values which are in bold). Next, let us see who is actually freeing those BSTRs (note: the ‘!clrstack’ and kb commands are executed manually when we hit the breakpoint on SysFreeString+0x7)

0:000> !clrstack;kb
OS Thread Id: 0xad8 (0)
Child SP IP       Call Site
0029efe4 76333e47 [InlinedCallFrame: 0029efe4] Microsoft.Win32.Win32Native.SysFreeString(IntPtr)
0029efe0 6ac0c7d5 System.StubHelpers.BSTRMarshaler.ClearNative(IntPtr)
0029f020 002a033e DomainBoundILStubClass.IL_STUB_CLRtoCOM(System.String ByRef)
0029f028 002a0121 [InlinedCallFrame: 0029f028] cominteropLib.Imyclass.SayHello(System.String ByRef)
0029f0c8 002a0121 ConsoleApplication1.Program.Main(System.String[]) [c:TempcominteropConsoleApplication1Program.cs @ 22]
0029f330 6c5421db [GCFrame: 0029f330]

The above is the managed call stack, which seems to suggest that the CLR-COM interop stub is automatically freeing up the BSTR returned by our COM component. FYI, you can look at the implementation of the StubHelpers.BSTRMarshaler class from the .NET Reference Source. FWIW, given below is the native call stack, just for your reference (there is not too much value add from it.)

ChildEBP RetAddr  Args to Child             
0029efd4 6ac0c7d5 0045713c 7058594b 6c545d80 OLEAUT32!SysFreeString+0x7
WARNING: Stack unwind information not available. Following frames may be wrong.
0029f040 6abf7774 0029f0c0 00000000 000d387c mscorlib_ni+0x26c7d5
0029f104 6c5421db 0029f148 0007281d 0029f190 mscorlib_ni+0x257774
0029f114 6c564a2a 0029f1e0 00000000 0029f1b0 clr+0x21db
0029f190 6c564bcc 0029f1e0 00000000 0029f1b0 clr!CoUninitializeEE+0x6862
0029f2c8 6c564c01 000dc030 0029f394 0029f354 clr!CoUninitializeEE+0x6a04
0029f2e4 6c564c21 000dc030 0029f394 0029f354 clr!CoUninitializeEE+0x6a39
0029f2fc 6c62ce82 0029f354 70d6a23c 00000000 clr!CoUninitializeEE+0x6a59
0029f460 6c62cf90 000d3810 00000001 0029f49c clr!GetCLRFunction+0xc08
0029f6c8 6c62cda4 00000000 70d6adf0 00000000 clr!GetCLRFunction+0xd16
0029fbac 6c62d199 00070000 00000000 70d6aa5c clr!GetCLRFunction+0xb2a
0029fc00 6c62d09a 00070000 70d6aa10 00000000 clr!GetCLRFunction+0xf1f
0029fc4c 6c6aaf00 70d6aad8 00000000 7737903b clr!GetCLRFunction+0xe20
0029fc84 6e2555ab 6c6aaee4 0029fca0 6fb47f16 clr!CorExeMain+0x1c
0029fc90 6fb47f16 00000000 6e250000 0029fcb4 mscoreei!CorExeMain+0x38
0029fca0 6fb44de3 00000000 7737d0e9 7ffda000 mscoree!CreateConfigStream+0x13f
0029fcb4 778919bb 7ffda000 771e5a5e 00000000 mscoree!CorExeMain+0x8
0029fcf4 7789198e 6fb44ddb 7ffda000 ffffffff ntdll!__RtlUserThreadStart+0x23
0029fd0c 00000000 6fb44ddb 7ffda000 00000000 ntdll!_RtlUserThreadStart+0x1b

Clueless?

So, what do we have so far? We know from DebugDiag that somehow we are leaking BSTR memory. But on the other hand from the live debug trace we have also found that the BSTR being returned from the COM component is indeed being freed by the CLR-COM interop layer. So what gives? Who is the other source of the leaked BSTRs?

For answering that, we will go back to our DebugDiag output. In that, it says the function mscorlib_ni+2c70aa was responsible for the native leak. Let us set a breakpoint there and check what that really means in managed code world:

0:000> bp mscorlib_ni+2c70aa
0:000> g
Breakpoint 2 hit
eax=0045713c ebx=00000000 ecx=0041c970 edx=00000006 esi=003e8680 edi=00000000
eip=6ac670aa esp=0029efbc ebp=0029effc iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246
mscorlib_ni+0x2c70aa:
6ac670aa c6460801        mov     byte ptr [esi+8],1         ds:0023:003e8688=00

Once the above breakpoint is hit, we can view the managed call stack. This time we will view it with local and parameter information (-l and –p).

0:000> !clrstack -l -p
OS Thread Id: 0xad8 (0)
Child SP IP       Call Site
0029efbc 6ac670aa DomainNeutralILStubClass.IL_STUB_PInvoke(Byte[], UInt32)
    PARAMETERS:
        <no data>
        <no data>

0029efc0 6abeb1e7 [InlinedCallFrame: 0029efc0] Microsoft.Win32.Win32Native.SysAllocStringByteLen(Byte[], UInt32)
0029f004 6abeb1e7 System.StubHelpers.BSTRMarshaler.ConvertToNative(System.String, IntPtr)
    PARAMETERS:
        strManaged (<CLR reg>) = 0x01e620a0
        pNativeBuffer = <no data>
    LOCALS:
        <no data>
        <CLR reg> = 0x00000000
        0x0029f008 = 0x00000016
        <no data>
        <no data>
        <no data>
        <no data>

0029f024 002a020e DomainBoundILStubClass.IL_STUB_CLRtoCOM(System.String ByRef)
    PARAMETERS:
        this = <no data>
        <no data>

0029f028 002a0121 [InlinedCallFrame: 0029f028] cominteropLib.Imyclass.SayHello(System.String ByRef)
0029f0c8 002a0121 ConsoleApplication1.Program.Main(System.String[])*** WARNING: Unable to verify checksum for ConsoleApplication1.exe
  [c:TempcominteropConsoleApplication1Program.cs @ 22]
    PARAMETERS:
        args (0x0029f0fc) = 0x01e3af40
    LOCALS:
        0x0029f0f8 = 0x01e620a0
        0x0029f0d0 = 0x01e3bf30
        0x0029f0f4 = 0x000010ee
        0x0029f0f0 = 0x00000001

0029f330 6c5421db [GCFrame: 0029f330]   

 

What I want to highlight here is that the local variable at stack address 0x0029f0f8 (with a actual object address of 0x01e620a0) is actually our local string sout:

0:000> !do 0x01e620a0
Name:        System.String
MethodTable: 6acbf9ac
EEClass:     6a9f8bb0
Size:        36(0x24) bytes
File:        C:WindowsMicrosoft.NetassemblyGAC_32mscorlibv4.0_4.0.0.0__b77a5c561934e089mscorlib.dll
String:      Hello World
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
6acc2978  40000ed        4         System.Int32  1 instance       11 m_stringLength
6acc1dc8  40000ee        8          System.Char  1 instance       48 m_firstChar
6acbf9ac  40000ef        8        System.String  0   shared   static Empty
    >> Domain:Value  003b0c10:01e31228 <<

This local variable is what is finally passed into BSTRMarshaler.ConvertToNative (the strManaged parameter is 0x01e620a0, which is the local sout string itself.)

Eureka!

So this, means that while calling the COM component, the CLR-COM interop layer is ‘helping’ us by wrapping our .NET string (the variable called sout) as a BSTR and then invoking the component. Evidently that is not getting freed up. On source code inspection, it becomes obvious that maybe we are missing a call to set sout = null after the call to the COM component. On fixing the code, the ‘leak’ goes away. Here is a comparison of the IL code before and after the fix.

Before the fix

.method private hidebysig static void  Main(string[] args) cil managed
{
  .entrypoint
  // Code size       56 (0x38)
  .maxstack  2
  .locals init ([0] string sout,
           [1] class cominteropLib.myclass cls,
           [2] bool CS$4$0000)
  IL_0000:  nop
  IL_0001:  call       valuetype [mscorlib]System.ConsoleKeyInfo [mscorlib]System.Console::ReadKey()
  IL_0006:  pop
  IL_0007:  ldstr      ""
  IL_000c:  stloc.0
  IL_000d:  ldstr      "8F69F29F-97C3-4228-9D03-6499E88C6F38"
  IL_0012:  newobj     instance void [mscorlib]System.Guid::.ctor(string)
  IL_0017:  call       class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromCLSID(valuetype [mscorlib]System.Guid)
  IL_001c:  call       object [mscorlib]System.Activator::CreateInstance(class [mscorlib]System.Type)
  IL_0021:  castclass  cominteropLib.myclass
  IL_0026:  stloc.1
  IL_0027:  br.s       IL_0034
  IL_0029:  nop
  IL_002a:  ldloc.1
  IL_002b:  ldloca.s   sout
  IL_002d:  callvirt   instance void cominteropLib.Imyclass::SayHello(string&)
  IL_0032:  nop
  IL_0033:  nop
  IL_0034:  ldc.i4.1
  IL_0035:  stloc.2
  IL_0036:  br.s       IL_0029
} // end of method Program::Main

After the fix

.method private hidebysig static void  Main(string[] args) cil managed
{
  .entrypoint
  // Code size       58 (0x3a)
  .maxstack  2
  .locals init ([0] string sout,
           [1] class cominteropLib.myclass cls,
           [2] bool CS$4$0000)
  IL_0000:  nop
  IL_0001:  call       valuetype [mscorlib]System.ConsoleKeyInfo [mscorlib]System.Console::ReadKey()
  IL_0006:  pop
  IL_0007:  ldstr      ""
  IL_000c:  stloc.0
  IL_000d:  ldstr      "8F69F29F-97C3-4228-9D03-6499E88C6F38"
  IL_0012:  newobj     instance void [mscorlib]System.Guid::.ctor(string)
  IL_0017:  call       class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromCLSID(valuetype [mscorlib]System.Guid)
  IL_001c:  call       object [mscorlib]System.Activator::CreateInstance(class [mscorlib]System.Type)
  IL_0021:  castclass  cominteropLib.myclass
  IL_0026:  stloc.1
  IL_0027:  br.s       IL_0036
  IL_0029:  nop
  IL_002a:  ldloc.1
  IL_002b:  ldloca.s   sout
  IL_002d:  callvirt   instance void cominteropLib.Imyclass::SayHello(string&)
  IL_0032:  nop
  IL_0033:  ldnull
  IL_0034:  stloc.0

  IL_0035:  nop
  IL_0036:  ldc.i4.1
  IL_0037:  stloc.2
  IL_0038:  br.s       IL_0029
} // end of method Program::Main

The main difference as you can see is the assignment of null to the sout variable (that is what the IL instruction stloc.0 does).

Conclusions

This was a very interesting walkthrough and I would like to summarize the steps:

  1. Use Performance Monitor to distinguish between a managed and unmanaged origin leak.
  2. For unmanaged leaks, you need to track those using DebugDiag’s leak tracking rule.
  3. Normally, for a pure unmanaged origin leak, the output from DebugDiag will give you the offending line of code.
  4. However in this case because the offending line of code is actually in an auto-generated CLR-COM stub, we cannot resolve it statically.
  5. Live debugging by setting the breakpoint on the native code address reported by DebugDiag gave us the opportunity to view the managed call stack leading to this unmanaged leak.

 

Recommendation

If you are reusing BSTRs across COM method calls, always set the strings returned to you by COM components explicitly to null in C# code before calling your COM method. That will allow the CLR-COM interop stub to correctly skip making the ConvertToNative call for the input BSTR, and therefore avoid the leak.

Well, I hope you really enjoyed this post. It was hard work for me but I enjoyed tracking it down. I hope you also enjoyed it as much as I did. Please drop a comment and see you next time!

Shared Source CLI 2.0 book

Yesterday I accidentally stumbled upon a goldmine: the Shared Source CLI 2.0 Internals book! This book covers the Shared Source CLI (if you don’t know what that is, check out the references at the end of this post) from an internals perspective. I find it extremely useful because a lot of what is written there is also relevant for the commercial version of the framework.

While the first edition was published by O’Reilly, it appears that the second edition [link here] never got published. So the authors, Joel Pobar and Ted Neward (both former Microsoft employees) have made the 2.0 book available for free on their websites.

Lastly, I’m not sure if you know, but we currently publish the ‘reference source’ for various versions of the framework. These cover most of the base class libraries, so it is a wealth of information. Check out the .NET Framework Libraries – Microsoft Reference Source Server!

References

Debugging managed code using VS.NET remote debugger

Visual Studio can be used to debug processes remotely, using the MSVSMON agent. This can be used for both native code and managed code. However, for successfully debugging managed code applications, symbol files have to be correctly loaded – and for that to happen, they must be in the correct place.

Prerequisites

For remote debugging of managed code to work correctly, the pre-requisites are:

  1. The Default transport needs to be used in Debug-Attach to process
  2. Firewall exceptions both ways
  3. Symbols need to be accessible from MSVSMON (target) – see note below
  4. Source will still be loaded on the client machine

 

Symbol locations

It is very important to keep in mind that VS.NET never loads mismatched symbols. So it is our responsibility to ensure matched PDBs are made available at one of the following locations on the remote machine:

  • At the location where the EXE is
  • On the path we specify in the VS.NET symbols dialog
  • _NT_SYMBOL_PATH
  • %SystemRoot%

Setup

Here is a schematic of the typical setup. Here, VS.NET acts as a controller, talking to MSVSMON (the remote debugger agent) which actually attaches to the process being debugged.

 

image

An additional challenge in our case is that the client and server are actually in different domains. To work around this issue, we create identically named local users (call it VSUSER) in the client and on the remote server. The password for this user is also kept the same on both machines. This technique is sometimes referred to as passthrough authentication or mirrored accounts.

Ready to roll

On the server and on the client, we now launch sessions logged in as VSUSER. You could also use the RUNAS command to obtain CMD prompts which are running as VSUSER. That will be used to launch the following:

  1. On the client side, we will launch DEVENV (VS.NET IDE) running as VSUSER
  2. On the server side, we will launch MSVSMON running as VSUSER
  3. On the server side we will launch the Windows application (running as any user.)

Next, we deploy the remote debugger, MSVSMON, which is found at the location C:Program Files (x86)Microsoft Visual Studio 10.0Common7IDERemote Debuggerx64. When you install and run this on the server side, you will most probably be prompted to unblock firewall ports:

image

The next step is to use the Debug-Attach to process menu on the client to discover the processes on the remote machine.

image

Permissions

If you try to remote debug a process which is not launched under the VSUSER identity, you may get this error message:

image 

To fix this issue, add VSUSER to the local administrators group on the server, and run MSVSMON as administrator (due to UAC). Once this is done, reattach to the process, you should get the following warning, and then you should be able to continue.

image

 

Summary

Remote debugging of managed processes (even those which are launched under another identity) is very much possible, as long as you use the Default transport, ensure authentication, keep communications open and most importantly – ensure that symbols are on the target at one of the locations specified in my above post. You can then resolve the source line numbers, and debug interactively!

 

Further Reading

How to- Run the Remote Debugging Monitor – Microsoft Corporation

Visual Studio 2008 – Remote Debugging with MSVSMON.EXE

That’s it for now! Please rate this post and leave your comments – much appreciated!

AlwaysOn Availability Group Forced Failover: Under the Hood

In this walkthrough I will take you through one aspect of FCI + AG configuration: forced failover. We will simulate failure and force failover. And in the process we will have some … Data Loss. Does that not sound horrible! Relax! You should not worry too much Smile, because in this case what we are going to discuss is actually a ‘manufactured’ data loss situation. The true intent of this post is to show you how the Availability Group architecture deals with ‘multiple master’ scenario and how it avoids the ‘split-brain’ condition.

Lab setup

For our testing the lab setup looks like this:

image

INST1 is a clustered instance of SQL with 2 nodes as possible owners, running SQL 2012 CU3. W2K8R2N3 is a server hosting a standalone instance of SQL 2012 CU3. All 3 servers are nodes in the same Windows Server Failover Cluster (WSFC), SQLAGCLUST.

Design decision #1: Asynchronous versus Synchronous secondary

We are allowed to configure either; but note the latency effects that synchronous secondary can have, especially if W2K8R2N3 was on a WAN link, I would prefer to use asynchronous in that case.

Design decision #2: Why not automatic failover for the secondary replica?

While configuring this setup, please note that while we are allowed to have a synchronous replica, it is not allowed to make that as automatic failover partner. If you try to configure W2K8R2N3 as a synchronous replica with automatic failover:

USE [master]
GO
ALTER AVAILABILITY GROUP [FCI_AG]
MODIFY REPLICA ON N’INST1′ WITH (PRIMARY_ROLE(READ_ONLY_ROUTING_LIST = (N’W2K8R2N3′)))
GO
ALTER AVAILABILITY GROUP [FCI_AG]
MODIFY REPLICA ON N’INST1′ WITH (FAILOVER_MODE = AUTOMATIC)
GO


When you do this, you will receive the following error message:

Msg 35215, Level 16, State 17, Line 1
The Alter operation is not allowed on availability replica ‘INST1’, because automatic failover mode is an invalid configuration on a SQL Server Failover Cluster Instance.  Retry the operation by specifying manual failover mode.

This is expected and by-design, the logic being that the FCI already has automatic failover configured between the nodes. The current implementation of AG doe not allow a failover outside of the FCI nodes.

 

Test Case

We will simulate a ‘perfect storm’ case – failure of the link to the DR, followed by failure of the primary site, followed by forced failover to the secondary site, and finally a failback to the primary site. In this rarest of rare scenarios we will show how the scope for data loss can arise.

Database and AG details

Here is the ‘full disclosure report’ for the test Smile

  • In our sample AG, there is just 1 database: ‘mydb’.
  • The primary replica INST1.mydb is a synchronous, readable secondary, with manual failover.
  • The database replica W2K8R2N3.mydb is synchronous, readable secondary with automatic failover.
  • For our testing we will use a simulated table ‘newtab’ which has 1 integer column, ‘i’.
  • Prior to these tests, the table ‘newtab’ was truncated and the ‘mydb’ transaction log has been backed up.
  • We will test for missing rows to determine the data loss.

Cluster node weight assignment

As part of the ‘good practices’ since W2K8R2N3 cannot participate in automatic failover, it is recommended to set it’s node weight to 0. By default it is 1:

PS C:Windowssystem32> (get-clusternode “w2k8r2n3”).Nodeweight
1

We will set it to 0:

PS C:Windowssystem32> (get-clusternode “w2k8r2n3”).Nodeweight = 0

Test 0: Ensure synchronization works normally

We insert our first row, and check the secondary replica to see if the log records made it across.

insert newtab values (1)

Almost immediately we can see the record is synchronized on the secondary replica. A casual inspection of the transaction log on both sides using fn_dblog() shows that the log records are in sync. Here are the last 3 entries displayed by fn_dblog:

0000001f:0000007b:0018    LOP_INSERT_ROWS    LCX_HEAP    0000:000002e9
0000001f:0000007b:0019    LOP_SET_FREE_SPACE    LCX_PFS    0000:00000000
0000001f:0000007b:001a    LOP_COMMIT_XACT    LCX_NULL    0000:000002e9

fn_dblog is undocumented, unsupported, but can be useful for learning and testing purposes. You can read more about it at the following community resource pages:

 

Test 1: Link failure / secondary down

This is simulated by stopping the service on W2K8R2N3. In this case, the primary replica (which is the FCI) runs ‘exposed’. ‘Exposed’ means that potentially the RPO of 0 data loss (for the AG) is not met because there is no link to the secondary replica.

At this time (when the secondary replica is stopped or disconnected) we see the following entry in the errorlog of primary replica INST1:

2012-09-20 11:52:42.880 spid44s      AlwaysOn Availability Groups connection with secondary database terminated for primary database ‘mydb’ on the availability replica with Replica ID: {ac95b2e1-d5fa-48c2-8995-2f7be4140ed3}. This is an informational message only. No user action is required.

Now, let’s continue ‘modifying’ data on INST1.mydb:

insert newtab values (2)

Let’s review the last few LSNs:

0000001f:00000081:0002    LOP_INSERT_ROWS    LCX_HEAP    0000:000002ec
0000001f:00000081:0003    LOP_COMMIT_XACT    LCX_NULL    0000:000002ec

So to summarize, at this time, we have 1 row (with a value of 2) in mydb.newtab which has not been transferred to W2K8R2N3. This is why we say we are ‘exposed’, because we have the potential of losing this change were something bad to happen to INST1 now.

Test 2: INST1 also fails; W2K8R2N3 comes up later

Imagine due to a steady degradation (maybe power issue or such) the INST1 SQL FCI instance also fails. We will demonstrate this by powering the N1 and N2 nodes off, so that way INST1 is unavailable. When these nodes are offline, the cluster itself fails.

Also imagine after a while, W2K8R2N3 (the erstwhile secondary) comes up. In such a case, let’s see what happens. Firstly, the failover cluster manager reports that it is unable to connect to our WSFC. So let’s see what we get when we generate the cluster log file:

C:Windowssystem32>cluster log /g
Generating the cluster log(s) …
The cluster log has been successfully generated on node ‘w2k8r2n3’…
The cluster log could not be created on node ‘W2K8R2N1’…
System error 1722 has occurred (0x000006ba).
The cluster log could not be created on node ‘W2K8R2N2’…
System error 1722 has occurred (0x000006ba).     

The multiple ‘errors’ are because the cluster log is attempted to be generated on all nodes. The Nodes 1 and 2 being down, these errors are expected. Let’s see what is in the cluster log on W2K8R2N3:

00000578.00000764::2012/09/20-19:10:39.769 INFO  [QUORUM] Node 3: setting next best epoch to 2012/09/13-06`03`57.603_11
00000578.00000764::2012/09/20-19:10:39.769 DBG   [QC] Calculating quorum: Configured nodes count = 3, Node weights count = 2, Quorum witness configured = false, Intersection count = 0
00000578.00000764::2012/09/20-19:10:39.769 INFO  [QUORUM] Node 3: Coordinator: no quorum yet. do nothing

Aha! So due to lack of quorum, our N3 cluster service is unable to form a cluster. At this stage, I will go ahead and stop the cluster service on N3 (if it has not stopped already.) We will start it and ‘force quorum’ later.

Test 3: Forcing quorum on W2K8R2N3

This part is quite simple in operation but underneath there are many complex details going on. I will not attempt to reproduce those details here but you can search for ‘Paxos Tag’ and also look at the 2 articles listed below to understand the mechanism:

So to force the cluster service to start and force a quorum, here are the commands. Firstly, we will use PowerShell to set the node weight of this W2K8R2N3 to 1:

PS C:Windowssystem32> (get-clusternode “w2k8r2n3”).NodeWeight  = 1

Next, we use the ForceQuorum switch to start the cluster service.

C:Windowssystem32>net start clussvc /fq

At this stage, if you look  at the SQL Management Studio, you will see this database in ‘Not Synchronizing’ state. Also the AG resource will state ‘Resolving’:

image

Next, we will force the failover of the AG to this node. To do this, we right click on the AG and select the Failover option, which in turn brings up this set of wizard screens. The first one warns us of the scope of data loss:

image

The second one ensures that we understand the implications of forcing failover:

image

Once the necessary steps are done, you will see the database status as Synchronized:

image

Forced failover: Under the hood

And most importantly from an internals perspective, looking at the transaction log using fn_dblog() shows the same record as the original modification (the row with value 1. The LSN is 1f:7b:18

0000001f:0000007b:0018    LOP_INSERT_ROWS    LCX_HEAP    0000:000002e9

At this time, we can also check the rows in the table on N3:

select * from mydb.dbo.newtab

The above shows just 1 row, with value 1 as expected. Let’s take a look at the error log on W2K8R2N3:

2012-09-20 12:32:38.110 spid53       The state of the local availability replica in availability group ‘FCI_AG’ has changed from ‘RESOLVING_NORMAL’ to ‘RESOLVING_PENDING_FAILOVER’. The replica state changed because of either a startup, a failover, a communication issue, or a cluster error.

Below you will see the state transition which is in response to our request to force failover:

2012-09-20 12:32:48.050 spid55       AlwaysOn: The local replica of availability group ‘FCI_AG’ is preparing to transition to the primary role in response to a request from the Windows Server Failover Clustering (WSFC) cluster. This is an informational message only. No user action is required
2012-09-20 12:32:48.060 spid55       The state of the local availability replica in availability group ‘FCI_AG’ has changed from ‘RESOLVING_PENDING_FAILOVER’ to ‘RESOLVING_NORMAL’. The replica state changed because of either a startup, a failover, a communication issue, or a cluster error.
2012-09-20 12:32:48.080 spid55       The state of the local availability replica in availability group ‘FCI_AG’ has changed from ‘RESOLVING_NORMAL’ to ‘PRIMARY_PENDING’. The replica state changed because of either a startup, a failover, a communication issue, or a cluster error.

2012-09-20 12:32:48.200 Server       The state of the local availability replica in availability group ‘FCI_AG’ has changed from ‘PRIMARY_PENDING’ to ‘PRIMARY_NORMAL’. The replica state changed because of either a startup, a failover, a communication issue, or a cluster error.

At this stage, the database can be brought online on W2K8R2N3:

2012-09-20 12:32:48.210 spid21s      Starting up database ‘mydb’.
2012-09-20 12:32:48.300 spid21s      3 transactions rolled forward in database ‘mydb’ (5:0). This is an informational message only. No user action is required.
2012-09-20 12:32:48.340 spid10s      0 transactions rolled back in database ‘mydb’ (5:0). This is an informational message only. No user action is required.  

Test 4: Bring INST1 back online

By doing this, we potentially risk what is known as a ‘split-brain’ situation. Normally the implementation of the AG resource DLL and the Lease Timeout avoids this issue IF it all nodes in the cluster can talk to each other. In the real DR failback scenario normally one would need to be very careful when bringing INST1 back online. The recommended way to prepare for this is to start the cluster services with the PreventQuorum switch first.

When we do bring the nodes for INST1 back online, it does put the mydb on INST1 in a Not Synchronizing state.

image

If we try to use the mydb database on INST1 it gives us the following error:

The target database, ‘mydb’, is participating in an availability group and is currently not accessible for queries. Either data movement is suspended or the availability replica is not enabled for read access. To allow read-only access to this and other databases in the availability group, enable read access to one or more secondary availability replicas in the group.  For more information, see the ALTER AVAILABILITY GROUP statement in SQL Server Books Online. (Microsoft SQL Server, Error: 976)

Let’s see if we can create a database snapshot to peer into what is inside INST1.mydb.

— Create the database snapshot
CREATE DATABASE mydb_dr_snap ON
( NAME = mydb, FILENAME =
‘f:sql2012datamydb_dr_snap.ss’ )
AS SNAPSHOT OF mydb;
GO
  

At this time, querying the snapshot using the following command:

select * from mydb_dr_snap.dbo.newtab

This gives us 2 rows back. That means the transactions on the erstwhile primary instance INST1 are still ‘there’ but because of the force quorum, the common LSN cannot be negotiated. This causes the issue of INST1 being in Not Synchronized state. You can see the proof for this in the error log. To compare the state of the old secondary replica with INST1, you can use TableDiff utility.

First, you will see the attempt made by the new secondary (INST1) to connect to the new primary (W2K8R2N3):

2012-09-20 12:40:34.990 spid23s      A connection for availability group ‘FCI_AG’ from availability replica ‘INST1’ with id  [280B1AED-49EA-4EF7-9BAF-C1EC13F50E51] to ‘W2K8R2N3’ with id [AC95B2E1-D5FA-48C2-8995-2F7BE4140ED3] has been successfully established.  This is an informational message
2012-09-20 12:40:37.630 spid29s      AlwaysOn Availability Groups connection with primary database established for secondary database ‘mydb’ on the availability replica with Replica ID: {ac95b2e1-d5fa-48c2-8995-2f7be4140ed3}. This is an informational message only. No user action is required.
2012-09-20 12:40:37.630 spid29s      Error: 35285, Severity: 16, State: 1.

This next entry is SO important for us to note. It is actually telling us that the ‘recovery LSN’ which is the ‘common handshake’ is 31:129:1. In hexadecimal notation, this is 1F : 81 : 1. From the first few steps in this walkthrough, you might remember that when we inserted row 2, the LSN was 1f: 81: 2.

2012-09-20 12:40:37.630 spid29s      The recovery LSN (31:129:1) was identified for the database with ID 5. This is an informational message only. No user action is required.
2012-09-20 12:40:37.630 spid29s      AlwaysOn Availability Groups data movement for database ‘mydb’ has been suspended for the following reason: “failover from partner” (Source ID 1; Source string: ‘SUSPEND_FROM_PARTNER’). To resume data movement on the database, you will need to resume the d  

Here is the notification that the snapshot was created:

2012-09-20 12:44:36.460 spid57       Starting up database ‘mydb_dr_snap’.
2012-09-20 12:44:36.470 spid57       4 transactions rolled forward in database ‘mydb_dr_snap’ (6:0). This is an informational message only. No user action is required.
2012-09-20 12:44:36.520 spid57       0 transactions rolled back in database ‘mydb_dr_snap’ (6:0). This is an informational message only. No user action is required.
2012-09-20 12:44:36.520 spid57       Recovery is writing a checkpoint in database ‘mydb_dr_snap’ (6). This is an informational message only. No user action is required.

Resume synchronization for INST1.mydb

Later, we resumed the synchronization for this database on INST1. That effectively ‘sets the clock back’ on INST1.mydb by rolling back some of the transaction which are subsequent to the recovery LSN. We do this by right clicking on the database under the Availability Group and selecting Resume Data Synchronization.

image

At this time select * from mydb.dbo.newtab gives 1 row back! That confirms the second of our ‘manufactured data loss’ scenarios Smile

As always let us look at what happened under the hood, by noting the error log entries on INST1. First, you will see the effect of the RESUME:

2012-09-20 12:48:26.410 spid55       ALTER DB param option: RESUME
2012-09-20 12:48:26.410 spid55       AlwaysOn Availability Groups data movement for database ‘mydb’ has been resumed. This is an informational message only. No user action is required.
2012-09-20 12:48:26.530 spid29s      AlwaysOn Availability Groups connection with primary database established for secondary database ‘mydb’ on the availability replica with Replica ID: {ac95b2e1-d5fa-48c2-8995-2f7be4140ed3}. This is an informational message only. No user action is required.
2012-09-20 12:48:26.530 spid29s      Error: 35285, Severity: 16, State: 1.
2012-09-20 12:48:26.530 spid29s      The recovery LSN (31:129:1) was identified for the database with ID 5. This is an informational message only. No user action is required.
2012-09-20 12:48:26.640 spid29s      Error: 35278, Severity: 17, State: 1.
2012-09-20 12:48:26.640 spid29s      Availability database ‘mydb’, which is in the secondary role, is being restarted to resynchronize with the current primary database. This is an informational message only. No user action is required.  

What follows is even more critical to note. It says we identified a higher LSN (31 : 130 : 1) but are ‘going back’ to the recovery LSN (31: 129: 1):

2012-09-20 12:48:26.790 spid40s      Error: 35285, Severity: 16, State: 1.
2012-09-20 12:48:26.790 spid40s      The recovery LSN (31:130:1) was identified for the database with ID 5. This is an informational message only. No user action is required.
2012-09-20 12:48:26.790 spid40s      Error: 35286, Severity: 16, State: 1.
2012-09-20 12:48:26.790 spid40s      Using the recovery LSN (31:129:1) stored in the metadata for the database with ID 5.

Finally, the database can be recovered and resume synchronization:

2012-09-20 12:48:26.960 spid40s      Starting up database ‘mydb’.
2012-09-20 12:48:27.000 spid40s      Starting up database ‘mydb_dr_snap’.
2012-09-20 12:48:27.420 spid40s      AlwaysOn Availability Groups connection with primary database established for secondary database ‘mydb’ on the availability replica with Replica ID: {ac95b2e1-d5fa-48c2-8995-2f7be4140ed3}. This is an informational message only. No user action is required.
2012-09-20 12:48:27.430 spid40s      Error: 35285, Severity: 16, State: 1.
2012-09-20 12:48:27.430 spid40s      The recovery LSN (31:129:1) was identified for the database with ID 5. This is an informational message only. No user action is required.
2012-09-20 12:48:27.430 spid40s      3 transactions rolled forward in database ‘mydb’ (5:0). This is an informational message only. No user action is required.
2012-09-20 12:48:27.460 spid29s      AlwaysOn Availability Groups connection with primary database established for secondary database ‘mydb’ on the availability replica with Replica ID: {ac95b2e1-d5fa-48c2-8995-2f7be4140ed3}. This is an informational message only. No user action is required.  

Here is the final state of transaction log on INST1. You will notice the absence of the LSN numbers associated with the row with value 2 (1f: 81: 2)

0000001f:0000007b:0019    LOP_SET_FREE_SPACE    LCX_PFS    0000:00000000
0000001f:0000007b:001a    LOP_COMMIT_XACT    LCX_NULL    0000:000002e9
0000001f:00000080:0001    LOP_MODIFY_ROW    LCX_BOOT_PAGE    0000:00000000
0000001f:00000081:0001    LOP_MODIFY_ROW    LCX_BOOT_PAGE    0000:00000000
0000001f:00000082:0001    LOP_MODIFY_ROW    LCX_BOOT_PAGE    0000:00000000
0000001f:00000083:0001    LOP_BEGIN_XACT    LCX_NULL    0000:000002ec
0000001f:00000083:0002    LOP_HOBT_DELTA    LCX_NULL    0000:000002ec

So this proves that we did lose the changes done in INST1 because of the forced failover to W2K8R2N3. However, with the database snapshot, we can attempt to reconcile those ‘lost’ changes with the base table.

Wrapping up

We covered quite of lot of deep stuff here. Here is a quick summary:

  1. Automatic failover is not possible when the primary replica is a FCI
  2. Normally because of this fact, you will assign 0 votes to the secondary replica nodes
  3. If you lose the primary site, you must bring up cluster service with force quorum on the secondary replica
  4. Next, you will force failover the AG to the secondary replica. Data loss might occur in this scenario.
  5. If you do get back the primary site in the mean time, use the Prevent Quorum switch on the erstwhile primary nodes to prevent ‘split-brain’ condition
  6. If you take the right steps and the earlier primary will be in Not Synchronizing state
  7. The earlier primary replica will still contain the changes which were made to it but not transmitted to the secondary site; you can use a database snapshot on the earlier primary replica to identify these changes, typically using an utility like TableDiff
  8. On resuming the earlier primary replica, it will don the role of a secondary. Again, in this case, data loss is expected if changes were made on the new primary which ‘override’ the earlier primary.

Wow! That was a long post but I hope it brings to light a lot of the detail around this very specific scenario of forced failover. I hope you enjoyed reading this. Please leave your comments, questions and feedback below. I appreciate those very much!

‘Proof of the pudding’: Global variables and PAGE_EXECUTE_WRITECOPY

Today I was teaching a debugging class to our customers. As a foundational element we normally review the virtual-to-physical address translation mechanism, at least to a basic level. In this class we decided to go a bit deeper and show the evidence Smile

Background reading

First, if you are not familiar with PTEs, PDEs and pages, you must read the following to understand those fundamental concepts:

 

Test Setup

My laptop hosts a Windows 2008 VM, using Hyper-V. The VM has been configured for kernel debugging using the following BCDEdit command:

bcdedit -debug on

We connect the kernel debugger on my laptop to this VM by using the ‘named pipe’ COM port emulation. The virtual COM port for the VM can be setup easily, and the steps for the same are documented here. Finally, the VM in turn has WinDbg installed. Since the VM is offline (no Internet connection,) the matching OS symbols were cached on the VM using this approach for offline debugging.

Initial observations

We initially launch two notepad.exe instances on the VM. In one instance, we will switch the ‘Word wrap’ setting to checked state; the other one we will leave as default. It so happens that this setting is mapped to a global variable notepad!fWrap. How did I figure that out? Using the following command in the VM WinDbg session:

0:001> x notepad!*wrap*
001ca034          notepad!fWrap = <no type information>

So here are the actual values for each of these processes:

notepad.exe #1:

0:001> x notepad!*wrap*
001ca034          notepad!fWrap = <no type information>
0:001> |
.  0    id: 910    create    name: notepad.exe   <—PID is 0x910
0:001> dd 001ca034 L1
001ca034  00000001        <—as you can see, the value is set to 1, which indicates word wrap is ON.

notepad.exe #2:

0:001> x notepad!fWrap
001ca034          notepad!fWrap = <no type information>
0:001> |
.  0    id: f6c    create    name: notepad.exe         <— PID is 0xf6c
0:001> dd 001ca034 L1
001ca034  00000000       <— In this case word wrap is OFF.

Notice that the virtual address space for fWrap is the same in both processes. So the question in the mind of the participant was, while theoretically we know that the address space is translated to different pages, can we prove it?

Verification using live Kernel Debugging

To answer the above question, the only way to be absolutely sure was to hook up a live kernel debugger session and check. Firstly, we identify the processes in the kernel mode (we need to process objects in order to do this correctly.)

kd> !process 0 0 notepad.exe
PROCESS 83b73d90  SessionId: 1  Cid: 0910    Peb: 7ffde000  ParentCid: 0c24
    DirBase: 3f77b3c0  ObjectTable: 945cd440  HandleCount:  46.
    Image: notepad.exe

PROCESS 8366eb38  SessionId: 1  Cid: 0f6c    Peb: 7ffda000  ParentCid: 0d9c
    DirBase: 3f77b400  ObjectTable: 944e46d0  HandleCount:  46.
    Image: notepad.exe

Next, we use the extension !vtop to map these to physical addresses. In the command below, the first argument to !vtop is the ‘page directory base’, the second is the virtual address. Firstly, for the process with PID 0x910 (which has word wrap ON) and whose Directory Base is 3f77b3c0 (obtained from the above !process output.)

kd> !vtop 3f77b3c0 001ca034
X86VtoP: Virt 001ca034, pagedir 3f77b3c0
X86VtoP: PAE PDPE 3f77b3c0 – 000000000e0f5801
X86VtoP: PAE PDE e0f5000 – 000000000e0fa867
X86VtoP: PAE PTE e0fae50 – 800000000c57a867
X86VtoP: PAE Mapped phys c57a034
Virtual address 1ca034 translates to physical address c57a034.

So !vtop says the physical address is c57a034. Notice the last 3 digits (which is the offset within the 4Kb page) is 034, which is the same as offset the virtual address. This is no coincidence, it is the way we break up the 32-bit virtual address into its constituents (page directory entry, page table entry and page offset.) So let us quickly verify the contents of physical memory location c57a034. To do that we have two options:

kd> !dd c57a034 L1
# c57a034 00000001

kd> dd /p c57a034 L1
0c57a034  00000001

Either way, we can be sure that this is THE content of our fWrap variable. Let us cross-check the other process as well in a similar way.

kd> !vtop 3f77b400 001ca034
X86VtoP: Virt 001ca034, pagedir 3f77b400
X86VtoP: PAE PDPE 3f77b400 – 000000000bd08801
X86VtoP: PAE PDE bd08000 – 000000000beeb867
X86VtoP: PAE PTE beebe50 – 800000000bc2c867
X86VtoP: PAE Mapped phys bc2c034
Virtual address 1ca034 translates to physical address bc2c034.

Next, let us dump the content of physical address bc2c034. Again we can use either !dd or dd /p to do this.

kd> dd /p bc2c034 L1
0bc2c034  00000000


kd> !dd bc2c034 L1
# bc2c034 00000000

So we can confirm that this is accurate as well, as it rightly shows the status value of 0.

Digging deeper

Now, let’s look at some more details. From the user mode WinDbg instance (running inside the VM) let’s check what kind of memory allocation is associated with the address notepad!fWrap. To do this, we will use the !address extension.

0:001> !address 001ca034

Usage:                  Image
Base Address:           001ca000
End Address:            001cb000
Region Size:            00001000
State:                  00001000    MEM_COMMIT
Protect:                00000004    PAGE_READWRITE
Type:                   01000000    MEM_IMAGE
Allocation Base:        001c0000
Allocation Protect:     00000080    PAGE_EXECUTE_WRITECOPY
Image Path:             notepad.exe
Module Name:            notepad
Loaded Image Name:      C:WindowsSystem32notepad.exe
Mapped Image Name:     
More info:              lmv m notepad
More info:              !lmi notepad
More info:              ln 0x1ca034
More info:              !dh 0x1c0000

Very interesting! So this is actually Image memory (as evidenced by the MEM_IMAGE type) and also the page is protected as a ‘copy-on-write’ page (PAGE_EXECUTE_WRITECOPY protection). This is actually the standard technique used for global variables within an image, which of course need to be uniquely maintained per-process.

Let’s turn our attention to the executable code within notepad, to see how that is mapped to physical addresses. We will randomly focus our attention on the function notepad!FastReplaceAll, which I gathered from the x notepad!* command output. notepad!FastReplaceAll is a function at address 001c743f in my debugging session:

0:001> u notepad!FastReplaceAll
notepad!FastReplaceAll:
001c743f 8bff            mov     edi,edi
001c7441 55              push    ebp
001c7442 8bec            mov     ebp,esp
001c7444 83ec24          sub     esp,24h
001c7447 56              push    esi
001c7448 33f6            xor     esi,esi
001c744a 56              push    esi
001c744b 56              push    esi

FWIW, the address is the same on the other notepad.exe instance as well. Looking at !address output, this page is also a copy-on-write page:

0:001> !address 001c743f
Usage:                  Image
Base Address:           001c1000
End Address:            001ca000
Region Size:            00009000
State:                  00001000    MEM_COMMIT
Protect:                00000020    PAGE_EXECUTE_READ
Type:                   01000000    MEM_IMAGE
Allocation Base:        001c0000
Allocation Protect:     00000080    PAGE_EXECUTE_WRITECOPY
Image Path:             notepad.exe
Module Name:            notepad
Loaded Image Name:      C:WindowsSystem32notepad.exe
Mapped Image Name:     
More info:              lmv m notepad
More info:              !lmi notepad
More info:              ln 0x1c743f
More info:              !dh 0x1c0000

However, let us see if this page has different physical addresses or not. Firstly, for the process with PID 0x910 (which has word wrap ON)

kd> !vtop 3f77b3c0 001c743f
X86VtoP: Virt 001c743f, pagedir 3f77b3c0
X86VtoP: PAE PDPE 3f77b3c0 – 000000000e0f5801
X86VtoP: PAE PDE e0f5000 – 000000000e0fa867
X86VtoP: PAE PTE e0fae38 – 0000000028d07025
X86VtoP: PAE Mapped phys 28d0743f
Virtual address 1c743f translates to physical address 28d0743f.

For the other process, it is as follows:

kd> !vtop 3f77b400 001c743f
X86VtoP: Virt 001c743f, pagedir 3f77b400
X86VtoP: PAE PDPE 3f77b400 – 000000000bd08801
X86VtoP: PAE PDE bd08000 – 000000000beeb867
X86VtoP: PAE PTE beebe38 – 0000000028d07025
X86VtoP: PAE Mapped phys 28d0743f
Virtual address 1c743f translates to physical address 28d0743f.

Voila! They are the same physical page. This is expected (in retrospect) because this code has not been changed and is therefore mapped to the same physical page 28d0743f.

Summary

  • Virtual to physical mapping is viewable in the kernel mode debug session using the !vtop extension
  • For global variables within the process, it is inevitable that though they share the same virtual address, their physical pages will be different due to the copy-on-write behavior
  • For executable code, the physical pages will be shared as long as the load address of the module was the same.

Well that’s it for now! If you liked this and want to see more ‘Proof of the pudding’, please take a minute to drop a comment, and I will be glad to oblige!