June 8, 2015March 20, 2019 by Arvind Shyamsundar

The strange case of the large LiveKernelReports folder

Some time back, I ran into a bit of a space crunch on the C: drive of my laptop which runs Windows 8.1. On digging a bit, I found a 2GB+ file at C:\Windows\LiveKernelReports\WinsockAFD-20150114-1722.dmp. Now, this was the first time I had seen a folder called LiveKernelReports and definitely the first time that I had seen a dump file with the name WinsockAFD*.dmp.

Important note: if you are not a developer and came here trying to figure out what to do with the files in this folder, please proceed directly to the ‘So What?’ section below.

Inside the Dump File

The first thing I did (of course Smile ) was to open up the dump file under the WinDbg debugger. In kernel mode dumps, the !analyze –v command generally gives good analysis results, so I decided to start from there (full output of !analyze is at the end of this post as an Appendix).

Firstly, the bugcheck code was 0x156. If you are a developer, and you have the Windows 8.1 SDK, you will see this file C:\Program Files (x86)\Windows Kits\8.1\Include\sharedbugcodes.h which has the bugcheck names. 0x156 is WINSOCK_DETECTED_HUNG_CLOSESOCKET_LIVEDUMP.

Second, this bugcheck, unlike most of the ones we know, did not ‘crash’ or ‘blue screen’ the system.

Live Kernel Dumps

All of this is great, but what’s really happening here? How come I got kernel dumps without the system ‘crashing’? Well, the answer is that in Windows 8.1 the Windows development team added some great reliability diagnostics in the form of ‘Live Kernel Dump Reporting’. With this feature, certain Windows components can request a ‘live dump’ to be gathered. In my above case, both a minidump (~ 278KB) and a ‘larger’ dump (~ 2GB) were gathered when the AFD (Ancillary Function Driver for WinSock) runtime detected that a socket did not close ‘in time’ (see bold sections in the Appendix for more information.)

The Windows Error Reporting feature will then use the minidump to help the Windows development team figure out if this is a ‘trending’ issue, prioritize it and then hopefully fix it if it is due to an issue with Windows. The ‘larger’ dump which I mentioned above is not normally uploaded unless the development team ‘asks’ for it again via the Windows Error Reporting and Action Center mechanisms (to ultimately give the end user control on what gets submitted.)

So What?

That is the million dollar question Smile As an end user, you may be wondering what to do with these types of dump files. The advice I can give you is: if the dump files are causing you to go very low on disk space, you can probably move the dump file off to cheaper storage, like an external HDD. BUT if you are repeatedly getting these dump files, it may be advisable to check for any third party drivers, especially anti-virus products or any other network related software. Sometimes older versions of such software may not ‘play well’ with Windows 8.1 and may be causing a stalled network operation, in turn leading to these dump files.

If you are an IT Pro and seeing these dump files on server class machines and / or on multiple PCs, you would do well to contact our CSS (Customer Service and Support) staff who can guide you further on why these dump files are occurring and what should be the course of action.

In Closing

I hope this helps understand this system folder and why it plays an important role in improving the reliability of Windows. If you are interested in this topic, I highly recommend this talk from Andrew Richards and Graham McIntyre, who are both on the Windows Reliability team. They explain how the OCA / WER mechanism works. Amazing stuff, check it out!

Appendix: !analyze –v output

0: kd> !analyze -v
…

WINSOCK_DETECTED_HUNG_CLOSESOCKET_LIVEDUMP (156)
Winsock detected a hung transport endpoint close request.
Arguments:
…

DEFAULT_BUCKET_ID:  WINBLUE_LIVE_KERNEL_DUMP

BUGCHECK_STR:  0x156

…

STACK_TEXT:
…

ffffd001`28e46660 fffff803`bdddd64d : ffffffff`800026bc 00000000`00000000 ffffc001`1f52ec00 00000000`00000000 : nt!DbgkpWerCaptureLiveFullDump+0x11f
ffffd001`28e466c0 fffff801`21b7e3b4 : 00000000`00000001 ffffd001`28e46889 00000000`00000048 ffffe000`3e9afda0 : nt!DbgkWerCaptureLiveKernelDump+0x1cd
ffffd001`28e46710 fffff801`21b7b4ff : ffffe000`3e9afda0 00000000`0000afd2 ffffe000`3e9afd00 00000000`00000002 : afd!AfdCaptureLiveKernelDumpForHungCloseRequest+0xa8
ffffd001`28e46770 fffff801`21b89cad : ffffe000`3e9afda0 ffffd001`28e46889 00000000`0000afd2 ffffd001`28e46808 : afd!AfdCloseTransportEndpoint+0x64ef
ffffd001`28e467d0 fffff801`21b89674 : 00000000`00000001 ffffe000`42d71010 00000000`00000000 ffffe000`3e9afda0 : afd!AfdCleanupCore+0x14d
ffffd001`28e468f0 fffff803`bdc47349 : ffffe000`42d71010 ffffe000`3d3fd080 00000000`00000000

…

April 19, 2015 by Arvind Shyamsundar

The strange case of the bloated C:WindowsCSC folder

A few days ago, when I was working at the office on the corporate network, I had to access a network share containing some installation files. At the time of starting this process, I remember distinctly that my disk free space was around 25GB or thereabouts. A few hours later, I found that my disk free space was down to almost 1GB! This, despite the fact that I never installed anything actually – I had just visited the network share to inspect file details.

A scan of the system with TreeSize Free (there are many others like SequoiaView which I have used as well) revealed around 25GB in the C:WindowsCSC folder itself. On first thoughts I was wondering if this folder was some kind of C# compiler related folder but that was quickly ruled out.

The answer is that this folder is used by the Offline Files feature in Windows. The way to check on the status and configuration for Offline Files in Windows 8.1 is quite simple. Open the Control Panel, and do a search for the keyword ‘Offline’. You will see the entry under ‘Sync Center’:

Click on ‘Manage offline files’ and therein you can check if Offline Files is enabled:

Then you can click on ‘Disk Usage’ to actually check how much space is used by Offline files. There are some files which are temporary in nature, and as you can see below, that was the case in my scenario. My colleague, Sujay Desai gave me this link to help understand what the Temporary Files really meant.

A good way to mitigate and control the disk space for Offline files is to adjust the limit of disk space for this feature. But I first chose to delete the ‘Temporary files’. That takes a few minutes typically and it will report that it has deleted the same:

Now, if you are a travelling person, you must also be aware of a proactive feature wherein on a slow connection Windows will automatically start caching files using Offline files. This is not configurable from the UI and can only be controlled by using a Group Policy.

There are many valid and useful cases where Offline Files can be helpful, but in my case I have no use for it. So, I decided to explicitly turn this Offline files feature off. You will be prompted for a reboot of the system after this is done:

Disclaimer: please only disable Offline files if you know what you are doing and if it is causing space issues on your boot volume. If you have made offline changes to your files, make sure you sync them back before doing any changes, because otherwise you may lose your changes. So please be VERY CAREFUL when you do this.

The end state in my system is as follows; Offline Files is disabled and the Cache has been emptied:

And of course, my free disk space is back to normal!

Hope you liked this tip, and if you liked it, please leave a comment and / or rate the blog post. I would appreciate that very much!

December 11, 2014 by Arvind Shyamsundar

WOW64 madness: debugging through the confusion

Last week I was teaching unmanaged code debugging to one of my customers. They were using 32-bit applications on 64-bit servers, which does create some unique problems. For example, capturing a ‘hang’ dump using right click on the Task Manager –> Processes list, will create a ‘64-bit dump’ (if that makes sense Smile ) and the stacks visible by default will be that of the wow64 emulation layer, which in turn cause problems for regular commands, debugging extensions like PSSCOR / SOS and even for some of the inbuilt extensions.

So here is a comprehensive listing of what happens when you use a specific style of capturing a dump / ETW log / using other tools. A similar themed post on all the tools we have is here.

	32-bit OS + 32-bit application	64-bit OS + 32-bit application (i.e. WOW64)	64-bit OS + 64-bit application
Production Live Debugging	Attach 32-bit WinDbg / CDB	Attach 32-bit WinDbg / CDB	Attach 64-bit WinDbg / CDB
Gathering a ‘hang’ dump *	1. Use ProcDump, preferably with –r switch 2. Use the Processes tab in DebugDiag 3. Use the 32-bit WinDbg / CDB in non-invasive mode and issue the .dump command	1. Use ProcDump, preferably with –r switch 2. Use the Processes tab in DebugDiag 3. Use the 32-bit WinDbg / CDB in non-invasive mode and issue the .dump command Avoid using Task Manager.	1. Use ProcDump, preferably with –r switch 2. Use the Processes tab in DebugDiag 3. Use the 64-bit WinDbg / CDB in non-invasive mode and issue the .dump command
Gathering a ‘crash’ dump *	1. Create a crash rule using DebugDiag x86 – ‘just works’ 2. ProcDump –e ‘just works’ 3. Use the appropriate debugger as per above guidance for hang dumps	1. Create a crash rule using DebugDiag x64 – ‘just works’. # 2. ProcDump –e ‘just works’ 3. Use the appropriate debugger as per above guidance for hang dumps	1. Create a crash rule using DebugDiag x64 – ‘just works’ 2. ProcDump –e ‘just works’ 3. Use the appropriate debugger as per above guidance for hang dumps
Troubleshooting High CPU usage using XPerf	Use 32-bit Windows Performance Toolkit	Use 64-bit Windows Performance Toolkit	Use 64-bit Windows Performance Toolkit
Troubleshooting a Performance issue using PerfView	PerfView works correctly regardless of the ‘bitness mix’	– same –	– same –
Dump analysis of a dump containing only native code ^	Any debugger should work but the same ‘bitness’ is recommended as far as possible.	– same –	– same –
Dump analysis of a dump containing any managed code ^	Debug using a 32-bit debugger and 32-bit PSSCOR / SOS.	Debug using a 32-bit debugger and 32-bit PSSCOR / SOS.	Debug using a 64-bit debugger and 64-bit PSSCOR / SOS.
‘Live’ kernel debugging using LiveKD	LiveKD ‘just works’ provided you have installed x86 Debugging Tools for Windows.	LiveKD ‘just works’ provided you have installed x64 Debugging Tools for Windows.	LiveKD ‘just works’ provided you have installed x64 Debugging Tools for Windows.
GFLAGS utility	Use the x86 version of GFlags	Use the x86 version of GFlags	Use the x64 version of GFlags
Application Verifier	Install x86 version of Application Verifier	Install x64 version of Application Verifier (automatically includes x86 binaries) and launch the ‘Application Verifier (WOW)’ program. $	Install x64 version of Application Verifier.

# Note that you will not be able to / should not install the 32-bit version of DebugDiag on 64-bit OS.

* Note: for ProcDump, must also use –ma switch for detailed debugging, especially if you are debugging memory issues and for managed code dumps it is a must as well.

^ Note: Dump must be gathered ‘correctly’ with one of the above supported mechanisms.

$ Empirically, running either the native or WOW version of the GUI seems to put entries under both sets of keys. Please check this post from Chris Jackson for some related details.

The other thing you need to understand before proceeding further is the role of the WOW64 layer. The following blog posts would help you in that case:

Hope this cheat sheet is useful for you. And if you do, I would really appreciate you taking a minute to rate this post and leave a comment!

March 20, 2014September 24, 2020 by Arvind Shyamsundar

Too many files causing RAM pressure? (a.k.a. DynCache to the rescue!)

Readers of this blog might recall a previous post which described one impact of having too many similarly named files in one folder in NTFS. It turns out that the 8.3 naming convention is not the only thing you need to worry about when you have very large amounts of (smaller) files in the same volume.

Today I was called in to assist with a performance issue on a server. The only visible symptom of the problem was excessive RAM utilization on the server. The interesting aspect was that no specific user-mode process was consuming that RAM, so we were wondering where it came from. If this server was running SQL Server, for example, the ‘ghost’ utilization could be due to locked pages (which do not show up in Task Manager) but that was not the case here.

Analysis

So we ran the RAMMap utility, we found that the usage for MetaFile was a substantial percentage of the total RAM usage. From this AskPerf blog post you can see what MetaFile is all about:

“Metafile is part of the system cache and consists of NTFS metadata. NTFS metadata includes the MFT as well as the other various NTFS metadata files (see How NTFS Works for more details, and of course Windows Internals is a great reference). In the MFT each file attribute record takes 1k and each file has at least one attribute record. Add to this the other NTFS metadata files and you can see why the Metafile category can grow quite large on servers with lots of files.”

The next step therefore was to cross-check how large the MFT was in reality. The easy way to do this is to utilize the command given below:

fsutil fsinfo ntfsinfo <drive letter>

A sample output is given below (from my own laptop Smile )

C:>fsutil fsinfo ntfsinfo c:
NTFS Volume Serial Number :       0x8a40c9ee40c9e0d5
NTFS Version   :                  3.1
LFS Version    :                  2.0
Number Sectors :                  0x000000003a2007ff
Total Clusters :                  0x00000000074400ff
Free Clusters :                  0x0000000000ab1f84
Total Reserved :                  0x000000000002ef10
Bytes Per Sector :               512
Bytes Per Physical Sector :       4096
Bytes Per Cluster :               4096
Bytes Per FileRecord Segment    : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length :           0x0000000030240000
Mft Start Lcn :                  0x00000000000c0000
Mft2 Start Lcn :                  0x0000000000000002
Mft Zone Start :                  0x0000000004d58da0
Mft Zone End   :                  0x0000000004d655c0
Resource Manager Identifier :     96CC88FE-5621-11E3-AF31-3C970EA47926

In this output, the “Mft Valid Data Length” gives us an indicator of how many bytes are used by the MFT. In the above case for example it equates to around 770MB:

0x0000000030240000 / (1024*1024) = 770.25 MB

Just as a curiosity, if I run RAMMap in my laptop (Windows 8.1), here is what I see, you can see a rough alignment with the above computed number.

FYI, in the server that I was looking at in the real world, the size of the MFT was actually 1.5 times the amount of RAM on the box Smile

Mitigation

Since the server in question was running Windows Server 2008 R2 SP1, we recommended the usage of the DynCache service sample, which would automatically control the size of the system cache based on system memory notifications.

SQL Server FileStream

Practically, this issue would also apply when you use SQL FileStream to store a very large number of blobs in the NTFS file system. In such cases, here are my recommendations:

Consider setting the SQL Server Database Engine to use ‘locked pages’ with ‘max server memory’ set appropriately.
If the OS is Windows 2008 R2 or below, you may additionally consider using the DynCache service to mitigate the effect that the large MFT will have.

Other notes

If you anticipate huge numbers of files to be stored on the file system, keep in mind that each file record will take up around 1KB. That means 100 million files will take up close to 100GB worth of MFT storage!

On Windows 2008 R2 and below be aware of issues like the one described in KB article 967351 and install the updated version of NTFS.sys accordingly.

In extreme cases, if you want to achieve optimal performance without compromising on system cache memory utilization, be aware that the system RAM sizing must be done accordingly keeping in mind the very large MFT which might result.

References

You may want to read these articles for more information:

June 19, 2013 by Arvind Shyamsundar

Debugging story: Slowness due to NTFS short file (8.3) name generation

When I teach production debugging to my customers, I always tell them that be successful you need to not only know the right tool and command syntax, but also know the right methodology. And perhaps even more importantly when debugging certain types of issues, knowledge of Windows Internals and the ability to ‘connect the dots’ is a savior.

Recently, I had one such experience wherein an issue with an application turned out to be related to a NTFS behavior (which I had previously read about in my SQL Server work) and had I not ‘connected the dots’ I would not have been able to remediate the issue quickly. Wondering what it is all about? Read on!

Scenario

So this was a strange problem: a server had hundreds of processes which were all seemingly ‘hung’ and not doing anything. The server in question was hosting a vendor application, which had spawned those processes, but normally those processes should execute quickly and not accumulate they way they had done.

This appeared to me like the processes where hung in some way. Since the application was deployed as a service, these child processes were on a non-visible desktop, so maybe they had popped up an UI element – that was my first theory. But as with any user mode process hanging situation, I decided to collect hang mode dumps and look at those to be sure about the root cause. I took sample dumps from 2 processes using DebugDiag 1.2.

Dump analysis

Once the dumps were available, I opened them in WinDbg (x86, as these were 32-bit application processes.) Here are the call stacks of the dump for the first process:

0:000> kL
ChildEBP RetAddr
0018d368 74cd9a04 ntdll!NtSetInformationFile+0x12
0018d43c 74cecd44 kernel32!MoveFileWithProgressTransactedW+0x334
0018d474 74ceccec kernel32!MoveFileWithProgressTransactedA+0x5b
0018d494 74d3d93f kernel32!MoveFileWithProgressA+0x1b
0018d4b0 00401302 kernel32!MoveFileA+0x16
WARNING: Stack unwind information not available. Following frames may be wrong.
0018d4cc 00401aa2 fooapp+0x7777
0018ff94 76fb9ef2 kernel32!BaseThreadInitThunk+0xe
0018ffd4 76fb9ec5 ntdll!__RtlUserThreadStart+0x70
0018ffec 00000000 ntdll!_RtlUserThreadStart+0x1b

The above call stack is due to a file being moved. In this case by dumping the parameters to MoveFileA, we found that the file was being renamed in the same folder (the file extension was being changed from .FOO to .BAR – file extensions changed to protect the identity Smile ). The call stack for the thread in the second process is given below:

0:000> kL
ChildEBP RetAddr
0018d38c 74c7c5dc ntdll!NtCreateFile+0x12
0018d430 74cc3f86 KERNELBASE!CreateFileW+0x35e
0018d45c 74cc53e4 kernel32!CreateFileWImplementation+0x69
0018d48c 004012b3 kernel32!CreateFileA+0x37
WARNING: Stack unwind information not available. Following frames may be wrong.
0018d4c0 004019df fooapp+8888
0018ff94 76fb9ef2 kernel32!BaseThreadInitThunk+0xe
0018ffd4 76fb9ec5 ntdll!__RtlUserThreadStart+0x70
0018ffec 00000000 ntdll!_RtlUserThreadStart+0x1b

The above is a call stack of the application creating a new file. By looking at both of these, it appeared to me that the hard disk was performing slowly. So we extracted the file name (the first parameter to CreateFileA routine) and found that it was a new file within a folder on the D: drive. When we ran performance counter logs for the D: drive, it appeared to be just normal.

So it was quite puzzling to see the calls to creating and renaming files taking so long despite the physical drive performing quite well.

Root cause

We then looked a bit further by browsing to the folder on D: drive which was being accessed by the above application. We could not even list the contents (Explorer would appear unresponsive.) It then came to our mind that when we have a large number of similarly named files in the same folder, (which was exactly the case with this application) then NTFS has to work extra hard to generate to unique 8.3 naming convention names (so called ‘short file name’.) This TechNet article mentions a threshold of 300000 files above which this short name generation can become a bottleneck. The algorithm for the short file name generation is also documented here and there is another very interesting but really old KB article which shows how to achieve the name conversion in FoxPro Smile

Now, these names are more of a legacy remnant of DOS and the 16-bit world, and in most servers there should not be any reason to use them. So thankfully we have a way to disable the generation of these short file names. I then tried to evaluate the effect that disabling this has. My test results are given below.

Testing

I executed a series of tests with a simple PowerShell script which creates and renames file in a tight loop. For each case, the folder already contained over a million files were already created in the same folder. My tests were done on a Windows 8 laptop with a single spindle SATA hard disk.

Test Description

File Count per process

Avg. time (8dot3 ON)

Avg. time (8dot3 OFF)

% Improvements

Single process creating files

25000

126028

102617

19%

5 process creating + renaming files

25000

567611

375946

34%

20 process creating files

25000

2420367

1422348

41%

Conclusion

The benefits of disabling 8dot3 name generation are obvious from the test results above, and validate the guidance given in the KB articles mentioned previously. And keep in mind these results were obtained on Windows 8 and Windows 2008 R2, wherein the algorithm for 8dot3 name generation has been further tuned. The gains might be even more dramatic on Windows 2003.

So in summary, we would like to confirm the benefits of disabling 8dot3 name generation on servers, where there are large number of similarly named files in the same folder. And for SQL Servers using FILESTREAM storage, this setting is recommended as well.

A curious case: CLR/COM Interop leak

Uncategorized
.NET, COM, Interop, Production debugging, WinDbg
Leave a comment

Background

A customer asked me a question last week: in CLR-COM interop case, who is responsible to free up a string returned from COM? My understanding was that the interop layer setup by .NET will automatically take care of this. However, I thought a mock test would not do any harm. To my surprise this test actually opened up the proverbial ‘can of worms’ and I thought I would share my experience with you!

Prerequisite reading

There was a lot of detailed debugging stuff we will cover here, and it assumes some basics. For those who are not aware of the basics, I recommend you first consume the following.

Free training video series- .NET Debugging for production environments
Production Debugging for .NET Framework Applications (there was a PDF version of this guide, but the link seems broken – Bing for ‘DBG_download.pdf’ if you want to look for it.)
For disassembly (e.g. how does ebp+8 map to parameters) please read Under the Hood, MSJ February 1998 – Microsoft Corporation and Annotated x86 Disassembly – Microsoft Corporation- Software

Test setup: COM Server

We have an in-process ATL COM server which exposes an interface and a single method, SayHello. Here is the IDL interface for this method:

interface Imyclass : IDispatch{
[id(1)] HRESULT SayHello([out] BSTR* hellostr);
};

And here is the implementation:

STDMETHODIMP Cmyclass::SayHello(BSTR* hellostr)
{
*hellostr = SysAllocString(L"Hello World");

return S_OK;
}

As you can see, this is no rocket science, we are allocating a BSTR inside the component and returning it to the caller. Therefore as per COM conventions the caller should free this up.

Test setup: C# client

We have a managed client written as a C# console application. A reference is added to the COM library, cominteropLib and that allows us to create instances of myclass:

static void Main(string[] args)
        {
            Console.ReadKey();

            string sout = "";
            myclass cls = new myclass();

            int i = 0;
            while (true)
            {
                cls.SayHello(ref sout);

                if (i % 100000 == 0)
                {
                    System.Threading.Thread.Sleep(1000);
                    Console.Write(sout);
                }

             i++;
            }

Observations

So when we run the C# application, and plot Virtual, Private Bytes and also # Managed bytes for this process, here is the observed trend:

As you can see, the Private Bytes increases steadily, but the amount of managed bytes recorded is not even registering (almost 0 at this scale of the graph.) Generally this indicates the leak is in unmanaged allocations, as you can see in Using Perfmon to Diagnose Application Issues – 02.

Tracking the unmanaged leak

The best way to track down unmanaged leaks is to use DebugDiag 1.2. We did that and ran the Memory Analysis (native code only) to find out the call stacks with the most outstanding unmanaged allocations. Here are the relevant sections from the DebugDiag report:

mscorlib.ni.dll is responsible for 73.24 MBytes worth of outstanding allocations. The following are the top 2 memory consuming functions: mscorlib_ni+2c70aa: 73.24 MBytes worth of outstanding allocations.

Memory manager statistics by allocation size
OLE automation BSTR memory manager : 73.24 MBytes
Heap memory manager : 622.98 KBytes

So from this, it is clear that we are somehow leaking BSTRs. Let us save the call stack of the leaked allocation, and come back to it later (only the top few frames are reproduced for brevity.)

Call stack sample 1

Address   0x00650048
Allocation Time   00:00:10 since tracking started
Allocation Size   24 Bytes

Function   Source   Destination
mscorlib_ni+2c70aa
clr+5d80
mscorlib_ni+2c70aa
clr+5d80
mscorlib_ni+24bc9d
clr+2ec3      0xAC07C8
clr+21db
clr!LogHelp_TerminateOnAssert+301ec
clr!LogHelp_TerminateOnAssert+74a40
0x1DC030
ConsoleApplication1+281e
ConsoleApplication1+281e
ConsoleApplication1+281d
ConsoleApplication1+281b

We will come back to this later.

Manually tracking the leak

Normally, we will just use the above call stack and drill down into the root cause. But I wanted to show you some other debugging methods which can be very useful. First, we will check if SysFreeString() is being called at all. Next, we will also dump the address of the BSTR allocated by SysAllocString() inside the COM component. We will compare these and check if we do release the BSTR which the COM component is allocating.

Checking SysAllocString / SysFreeString pairs

We will use conditional breakpoints in WinDbg to get this debug output. Here are the breakpoints:

bp `cominterop!myclass.cpp:21` ".echo ‘SysAllocString’; dd poi(hellostr) L1;g"

bp OLEAUT32!SysFreeString+0x7 ".echo ‘SysFreeString’; dd ebp+8 L1; g"

A quick explanation of these breakpoints is in order. The first one is a normal location breakpoint set on the line number 21 of myclass.cpp. We can easily do this because we have the source code of the COM component. In the case of the second (SysFreeString) breakpoint, we are going as per the disassembly knowledge, and guessing that [ebp+8] is actually the input parameter. Here is the disassembly of oleaut32!SysFreeString for your reference:

0:004> u oleaut32!SysFreeString
OLEAUT32!SysFreeString:
76333e40 8bff            mov     edi,edi
76333e42 55              push    ebp
76333e43 8bec            mov     ebp,esp
76333e45 51              push    ecx
76333e46 56              push    esi
76333e47 8b7508          mov     esi,dword ptr [ebp+8]    <— This is the second breakpoint. We will dump [ebp+8] here.
76333e4a 85f6            test    esi,esi
76333e4c 743d            je      OLEAUT32!SysFreeString+0x6c (76333e8b)

Both breakpoints contain commands to be executed when the breakpoint is hit. For the breakpoint in myclass, we will dump the address of the BSTR being returned by SysAllocString. For the SysFreeString we will dump the contents of location ebp+8 (so that we get the BSTR address being passed in.)

Here is the sample output from these breakpoints (snipped and reformatted to suit this blog post)

‘SysAllocString’ 0029f090 0045705c
‘SysFreeString’ 0029efdc 0045705c
‘SysAllocString’ 0029f090 00457094
‘SysFreeString’ 0029efdc 00457094
‘SysAllocString’ 0029f090 004570cc
‘SysFreeString’ 0029efdc 004570cc
‘SysAllocString’ 0029f090 00457104
‘SysFreeString’ 0029efdc 00457104
‘SysAllocString’ 0029f090 0045713c
‘SysFreeString’ 0029efdc 0045713c

As you can see from the above, we are very clearly freeing up the BSTR which is being returned by the COM component (values which are in bold). Next, let us see who is actually freeing those BSTRs (note: the ‘!clrstack’ and kb commands are executed manually when we hit the breakpoint on SysFreeString+0x7)

0:000> !clrstack;kb
OS Thread Id: 0xad8 (0)
Child SP IP Call Site
0029efe4 76333e47 [InlinedCallFrame: 0029efe4] Microsoft.Win32.Win32Native.SysFreeString(IntPtr)
0029efe0 6ac0c7d5 System.StubHelpers.BSTRMarshaler.ClearNative(IntPtr)
0029f020 002a033e DomainBoundILStubClass.IL_STUB_CLRtoCOM(System.String ByRef)
0029f028 002a0121 [InlinedCallFrame: 0029f028] cominteropLib.Imyclass.SayHello(System.String ByRef)
0029f0c8 002a0121 ConsoleApplication1.Program.Main(System.String[]) [c:TempcominteropConsoleApplication1Program.cs @ 22]
0029f330 6c5421db [GCFrame: 0029f330]

The above is the managed call stack, which seems to suggest that the CLR-COM interop stub is automatically freeing up the BSTR returned by our COM component. FYI, you can look at the implementation of the StubHelpers.BSTRMarshaler class from the .NET Reference Source. FWIW, given below is the native call stack, just for your reference (there is not too much value add from it.)

ChildEBP RetAddr Args to Child
0029efd4 6ac0c7d5 0045713c 7058594b 6c545d80 OLEAUT32!SysFreeString+0x7
WARNING: Stack unwind information not available. Following frames may be wrong.
0029f040 6abf7774 0029f0c0 00000000 000d387c mscorlib_ni+0x26c7d5
0029f104 6c5421db 0029f148 0007281d 0029f190 mscorlib_ni+0x257774
0029f114 6c564a2a 0029f1e0 00000000 0029f1b0 clr+0x21db
0029f190 6c564bcc 0029f1e0 00000000 0029f1b0 clr!CoUninitializeEE+0x6862
0029f2c8 6c564c01 000dc030 0029f394 0029f354 clr!CoUninitializeEE+0x6a04
0029f2e4 6c564c21 000dc030 0029f394 0029f354 clr!CoUninitializeEE+0x6a39
0029f2fc 6c62ce82 0029f354 70d6a23c 00000000 clr!CoUninitializeEE+0x6a59
0029f460 6c62cf90 000d3810 00000001 0029f49c clr!GetCLRFunction+0xc08
0029f6c8 6c62cda4 00000000 70d6adf0 00000000 clr!GetCLRFunction+0xd16
0029fbac 6c62d199 00070000 00000000 70d6aa5c clr!GetCLRFunction+0xb2a
0029fc00 6c62d09a 00070000 70d6aa10 00000000 clr!GetCLRFunction+0xf1f
0029fc4c 6c6aaf00 70d6aad8 00000000 7737903b clr!GetCLRFunction+0xe20
0029fc84 6e2555ab 6c6aaee4 0029fca0 6fb47f16 clr!CorExeMain+0x1c
0029fc90 6fb47f16 00000000 6e250000 0029fcb4 mscoreei!CorExeMain+0x38
0029fca0 6fb44de3 00000000 7737d0e9 7ffda000 mscoree!CreateConfigStream+0x13f
0029fcb4 778919bb 7ffda000 771e5a5e 00000000 mscoree!CorExeMain+0x8
0029fcf4 7789198e 6fb44ddb 7ffda000 ffffffff ntdll!__RtlUserThreadStart+0x23
0029fd0c 00000000 6fb44ddb 7ffda000 00000000 ntdll!_RtlUserThreadStart+0x1b

Clueless?

So, what do we have so far? We know from DebugDiag that somehow we are leaking BSTR memory. But on the other hand from the live debug trace we have also found that the BSTR being returned from the COM component is indeed being freed by the CLR-COM interop layer. So what gives? Who is the other source of the leaked BSTRs?

For answering that, we will go back to our DebugDiag output. In that, it says the function mscorlib_ni+2c70aa was responsible for the native leak. Let us set a breakpoint there and check what that really means in managed code world:

0:000> bp mscorlib_ni+2c70aa
0:000> g
Breakpoint 2 hit
eax=0045713c ebx=00000000 ecx=0041c970 edx=00000006 esi=003e8680 edi=00000000
eip=6ac670aa esp=0029efbc ebp=0029effc iopl=0         nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000             efl=00000246
mscorlib_ni+0x2c70aa:
6ac670aa c6460801        mov     byte ptr [esi+8],1         ds:0023:003e8688=00

Once the above breakpoint is hit, we can view the managed call stack. This time we will view it with local and parameter information (-l and –p).

0:000> !clrstack -l -p
OS Thread Id: 0xad8 (0)
Child SP IP       Call Site
0029efbc 6ac670aa DomainNeutralILStubClass.IL_STUB_PInvoke(Byte[], UInt32)
    PARAMETERS:
        <no data>
        <no data>

0029efc0 6abeb1e7 [InlinedCallFrame: 0029efc0] Microsoft.Win32.Win32Native.SysAllocStringByteLen(Byte[], UInt32)
0029f004 6abeb1e7 System.StubHelpers.BSTRMarshaler.ConvertToNative(System.String, IntPtr)
    PARAMETERS:
        strManaged (<CLR reg>) = 0x01e620a0
        pNativeBuffer = <no data>
    LOCALS:
        <no data>
        <CLR reg> = 0x00000000
        0x0029f008 = 0x00000016
        <no data>
        <no data>
        <no data>
        <no data>

0029f024 002a020e DomainBoundILStubClass.IL_STUB_CLRtoCOM(System.String ByRef)
    PARAMETERS:
        this = <no data>
        <no data>

0029f028 002a0121 [InlinedCallFrame: 0029f028] cominteropLib.Imyclass.SayHello(System.String ByRef)
0029f0c8 002a0121 ConsoleApplication1.Program.Main(System.String[])*** WARNING: Unable to verify checksum for ConsoleApplication1.exe
[c:TempcominteropConsoleApplication1Program.cs @ 22]
    PARAMETERS:
        args (0x0029f0fc) = 0x01e3af40
    LOCALS:
        0x0029f0f8 = 0x01e620a0
        0x0029f0d0 = 0x01e3bf30
        0x0029f0f4 = 0x000010ee
        0x0029f0f0 = 0x00000001

0029f330 6c5421db [GCFrame: 0029f330]

What I want to highlight here is that the local variable at stack address 0x0029f0f8 (with a actual object address of 0x01e620a0) is actually our local string sout:

0:000> !do 0x01e620a0
Name:        System.String
MethodTable: 6acbf9ac
EEClass:     6a9f8bb0
Size:        36(0x24) bytes
File:        C:WindowsMicrosoft.NetassemblyGAC_32mscorlibv4.0_4.0.0.0__b77a5c561934e089mscorlib.dll
String:      Hello World
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
6acc2978 40000ed        4         System.Int32 1 instance       11 m_stringLength
6acc1dc8 40000ee        8          System.Char 1 instance       48 m_firstChar
6acbf9ac 40000ef        8        System.String 0   shared   static Empty
    >> Domain:Value 003b0c10:01e31228 <<

This local variable is what is finally passed into BSTRMarshaler.ConvertToNative (the strManaged parameter is 0x01e620a0, which is the local sout string itself.)

Eureka!

So this, means that while calling the COM component, the CLR-COM interop layer is ‘helping’ us by wrapping our .NET string (the variable called sout) as a BSTR and then invoking the component. Evidently that is not getting freed up. On source code inspection, it becomes obvious that maybe we are missing a call to set sout = null after the call to the COM component. On fixing the code, the ‘leak’ goes away. Here is a comparison of the IL code before and after the fix.

Before the fix

.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size       56 (0x38)
.maxstack 2
.locals init ([0] string sout,
           [1] class cominteropLib.myclass cls,
           [2] bool CS$4$0000)
IL_0000: nop
IL_0001: call       valuetype [mscorlib]System.ConsoleKeyInfo [mscorlib]System.Console::ReadKey()
IL_0006: pop
IL_0007: ldstr      ""
IL_000c: stloc.0
IL_000d: ldstr      "8F69F29F-97C3-4228-9D03-6499E88C6F38"
IL_0012: newobj     instance void [mscorlib]System.Guid::.ctor(string)
IL_0017: call       class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromCLSID(valuetype [mscorlib]System.Guid)
IL_001c: call       object [mscorlib]System.Activator::CreateInstance(class [mscorlib]System.Type)
IL_0021: castclass cominteropLib.myclass
IL_0026: stloc.1
IL_0027: br.s       IL_0034
IL_0029: nop
IL_002a: ldloc.1
IL_002b: ldloca.s   sout
IL_002d: callvirt   instance void cominteropLib.Imyclass::SayHello(string&)
IL_0032: nop
IL_0033: nop
IL_0034: ldc.i4.1
IL_0035: stloc.2
IL_0036: br.s       IL_0029
} // end of method Program::Main

After the fix

.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size       58 (0x3a)
.maxstack 2
.locals init ([0] string sout,
           [1] class cominteropLib.myclass cls,
           [2] bool CS$4$0000)
IL_0000: nop
IL_0001: call       valuetype [mscorlib]System.ConsoleKeyInfo [mscorlib]System.Console::ReadKey()
IL_0006: pop
IL_0007: ldstr      ""
IL_000c: stloc.0
IL_000d: ldstr      "8F69F29F-97C3-4228-9D03-6499E88C6F38"
IL_0012: newobj     instance void [mscorlib]System.Guid::.ctor(string)
IL_0017: call       class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromCLSID(valuetype [mscorlib]System.Guid)
IL_001c: call       object [mscorlib]System.Activator::CreateInstance(class [mscorlib]System.Type)
IL_0021: castclass cominteropLib.myclass
IL_0026: stloc.1
IL_0027: br.s       IL_0036
IL_0029: nop
IL_002a: ldloc.1
IL_002b: ldloca.s   sout
IL_002d: callvirt   instance void cominteropLib.Imyclass::SayHello(string&)
IL_0032: nop
IL_0033: ldnull
IL_0034: stloc.0
IL_0035: nop
IL_0036: ldc.i4.1
IL_0037: stloc.2
IL_0038: br.s       IL_0029
} // end of method Program::Main

The main difference as you can see is the assignment of null to the sout variable (that is what the IL instruction stloc.0 does).

Conclusions

This was a very interesting walkthrough and I would like to summarize the steps:

Use Performance Monitor to distinguish between a managed and unmanaged origin leak.
For unmanaged leaks, you need to track those using DebugDiag’s leak tracking rule.
Normally, for a pure unmanaged origin leak, the output from DebugDiag will give you the offending line of code.
However in this case because the offending line of code is actually in an auto-generated CLR-COM stub, we cannot resolve it statically.
Live debugging by setting the breakpoint on the native code address reported by DebugDiag gave us the opportunity to view the managed call stack leading to this unmanaged leak.

Recommendation

If you are reusing BSTRs across COM method calls, always set the strings returned to you by COM components explicitly to null in C# code before calling your COM method. That will allow the CLR-COM interop stub to correctly skip making the ConvertToNative call for the input BSTR, and therefore avoid the leak.

Well, I hope you really enjoyed this post. It was hard work for me but I enjoyed tracking it down. I hope you also enjoyed it as much as I did. Please drop a comment and see you next time!

October 1, 2012 by Arvind Shyamsundar

Free training video series: .NET Debugging for production environments

This is a quick one: my PFE colleagues Brad Linscott and Mario Hewardt have contributed an excellent video series on Channel 9. Check it out NOW at .NET Debugging for the Production Environment! Awesome stuff!

September 26, 2012 by Arvind Shyamsundar

Debugging managed code using VS.NET remote debugger

Visual Studio can be used to debug processes remotely, using the MSVSMON agent. This can be used for both native code and managed code. However, for successfully debugging managed code applications, symbol files have to be correctly loaded – and for that to happen, they must be in the correct place.

Prerequisites

For remote debugging of managed code to work correctly, the pre-requisites are:

The Default transport needs to be used in Debug-Attach to process
Firewall exceptions both ways
Symbols need to be accessible from MSVSMON (target) – see note below
Source will still be loaded on the client machine

Symbol locations

It is very important to keep in mind that VS.NET never loads mismatched symbols. So it is our responsibility to ensure matched PDBs are made available at one of the following locations on the remote machine:

At the location where the EXE is
On the path we specify in the VS.NET symbols dialog
_NT_SYMBOL_PATH
%SystemRoot%

Setup

Here is a schematic of the typical setup. Here, VS.NET acts as a controller, talking to MSVSMON (the remote debugger agent) which actually attaches to the process being debugged.

An additional challenge in our case is that the client and server are actually in different domains. To work around this issue, we create identically named local users (call it VSUSER) in the client and on the remote server. The password for this user is also kept the same on both machines. This technique is sometimes referred to as passthrough authentication or mirrored accounts.

Ready to roll

On the server and on the client, we now launch sessions logged in as VSUSER. You could also use the RUNAS command to obtain CMD prompts which are running as VSUSER. That will be used to launch the following:

On the client side, we will launch DEVENV (VS.NET IDE) running as VSUSER
On the server side, we will launch MSVSMON running as VSUSER
On the server side we will launch the Windows application (running as any user.)

Next, we deploy the remote debugger, MSVSMON, which is found at the location C:Program Files (x86)Microsoft Visual Studio 10.0Common7IDERemote Debuggerx64. When you install and run this on the server side, you will most probably be prompted to unblock firewall ports:

The next step is to use the Debug-Attach to process menu on the client to discover the processes on the remote machine.

Permissions

If you try to remote debug a process which is not launched under the VSUSER identity, you may get this error message:

To fix this issue, add VSUSER to the local administrators group on the server, and run MSVSMON as administrator (due to UAC). Once this is done, reattach to the process, you should get the following warning, and then you should be able to continue.

Summary

Remote debugging of managed processes (even those which are launched under another identity) is very much possible, as long as you use the Default transport, ensure authentication, keep communications open and most importantly – ensure that symbols are on the target at one of the locations specified in my above post. You can then resolve the source line numbers, and debug interactively!

‘Proof of the pudding’: Global variables and PAGE_EXECUTE_WRITECOPY

Today I was teaching a debugging class to our customers. As a foundational element we normally review the virtual-to-physical address translation mechanism, at least to a basic level. In this class we decided to go a bit deeper and show the evidence Smile

Background reading

First, if you are not familiar with PTEs, PDEs and pages, you must read the following to understand those fundamental concepts:

Understanding !PTE , Part 1- Let’s get physical – Ntdebugging
Understanding !PTE, Part2- Flags and Large Pages – Ntdebugging
Memory Management — CRK Unit OS5 (highly recommended reading!)

Test Setup

My laptop hosts a Windows 2008 VM, using Hyper-V. The VM has been configured for kernel debugging using the following BCDEdit command:

bcdedit -debug on

We connect the kernel debugger on my laptop to this VM by using the ‘named pipe’ COM port emulation. The virtual COM port for the VM can be setup easily, and the steps for the same are documented here. Finally, the VM in turn has WinDbg installed. Since the VM is offline (no Internet connection,) the matching OS symbols were cached on the VM using this approach for offline debugging.

Initial observations

We initially launch two notepad.exe instances on the VM. In one instance, we will switch the ‘Word wrap’ setting to checked state; the other one we will leave as default. It so happens that this setting is mapped to a global variable notepad!fWrap. How did I figure that out? Using the following command in the VM WinDbg session:

0:001> x notepad!*wrap*
001ca034 notepad!fWrap = <no type information>

So here are the actual values for each of these processes:

notepad.exe #1:

0:001> x notepad!*wrap*
001ca034          notepad!fWrap = <no type information>
0:001> |
. 0    id: 910    create    name: notepad.exe   <—PID is 0x910
0:001> dd 001ca034 L1
001ca034 00000001        <—as you can see, the value is set to 1, which indicates word wrap is ON.

notepad.exe #2:

0:001> x notepad!fWrap
001ca034          notepad!fWrap = <no type information>
0:001> |
. 0    id: f6c    create    name: notepad.exe         <— PID is 0xf6c
0:001> dd 001ca034 L1
001ca034 00000000       <— In this case word wrap is OFF.

Notice that the virtual address space for fWrap is the same in both processes. So the question in the mind of the participant was, while theoretically we know that the address space is translated to different pages, can we prove it?

Verification using live Kernel Debugging

To answer the above question, the only way to be absolutely sure was to hook up a live kernel debugger session and check. Firstly, we identify the processes in the kernel mode (we need to process objects in order to do this correctly.)

kd> !process 0 0 notepad.exe
PROCESS 83b73d90 SessionId: 1 Cid: 0910    Peb: 7ffde000 ParentCid: 0c24
    DirBase: 3f77b3c0 ObjectTable: 945cd440 HandleCount: 46.
    Image: notepad.exe

PROCESS 8366eb38 SessionId: 1 Cid: 0f6c    Peb: 7ffda000 ParentCid: 0d9c
    DirBase: 3f77b400 ObjectTable: 944e46d0 HandleCount: 46.
    Image: notepad.exe

Next, we use the extension !vtop to map these to physical addresses. In the command below, the first argument to !vtop is the ‘page directory base’, the second is the virtual address. Firstly, for the process with PID 0x910 (which has word wrap ON) and whose Directory Base is 3f77b3c0 (obtained from the above !process output.)

kd> !vtop 3f77b3c0 001ca034
X86VtoP: Virt 001ca034, pagedir 3f77b3c0
X86VtoP: PAE PDPE 3f77b3c0 – 000000000e0f5801
X86VtoP: PAE PDE e0f5000 – 000000000e0fa867
X86VtoP: PAE PTE e0fae50 – 800000000c57a867
X86VtoP: PAE Mapped phys c57a034
Virtual address 1ca034 translates to physical address c57a034.

So !vtop says the physical address is c57a034. Notice the last 3 digits (which is the offset within the 4Kb page) is 034, which is the same as offset the virtual address. This is no coincidence, it is the way we break up the 32-bit virtual address into its constituents (page directory entry, page table entry and page offset.) So let us quickly verify the contents of physical memory location c57a034. To do that we have two options:

kd> !dd c57a034 L1
# c57a034 00000001

kd> dd /p c57a034 L1
0c57a034 00000001

Either way, we can be sure that this is THE content of our fWrap variable. Let us cross-check the other process as well in a similar way.

kd> !vtop 3f77b400 001ca034
X86VtoP: Virt 001ca034, pagedir 3f77b400
X86VtoP: PAE PDPE 3f77b400 – 000000000bd08801
X86VtoP: PAE PDE bd08000 – 000000000beeb867
X86VtoP: PAE PTE beebe50 – 800000000bc2c867
X86VtoP: PAE Mapped phys bc2c034
Virtual address 1ca034 translates to physical address bc2c034.

Next, let us dump the content of physical address bc2c034. Again we can use either !dd or dd /p to do this.

kd> dd /p bc2c034 L1
0bc2c034 00000000

kd> !dd bc2c034 L1
# bc2c034 00000000

So we can confirm that this is accurate as well, as it rightly shows the status value of 0.

Digging deeper

Now, let’s look at some more details. From the user mode WinDbg instance (running inside the VM) let’s check what kind of memory allocation is associated with the address notepad!fWrap. To do this, we will use the !address extension.

0:001> !address 001ca034

Usage:                  Image
Base Address:           001ca000
End Address:            001cb000
Region Size:            00001000
State:                  00001000    MEM_COMMIT
Protect:                00000004    PAGE_READWRITE
Type:                   01000000    MEM_IMAGE
Allocation Base:        001c0000
Allocation Protect:     00000080    PAGE_EXECUTE_WRITECOPY
Image Path:             notepad.exe
Module Name:            notepad
Loaded Image Name:      C:WindowsSystem32notepad.exe
Mapped Image Name:
More info:              lmv m notepad
More info:              !lmi notepad
More info:              ln 0x1ca034
More info:              !dh 0x1c0000

Very interesting! So this is actually Image memory (as evidenced by the MEM_IMAGE type) and also the page is protected as a ‘copy-on-write’ page (PAGE_EXECUTE_WRITECOPY protection). This is actually the standard technique used for global variables within an image, which of course need to be uniquely maintained per-process.

Let’s turn our attention to the executable code within notepad, to see how that is mapped to physical addresses. We will randomly focus our attention on the function notepad!FastReplaceAll, which I gathered from the x notepad!* command output. notepad!FastReplaceAll is a function at address 001c743f in my debugging session:

0:001> u notepad!FastReplaceAll
notepad!FastReplaceAll:
001c743f 8bff            mov     edi,edi
001c7441 55              push    ebp
001c7442 8bec            mov     ebp,esp
001c7444 83ec24          sub     esp,24h
001c7447 56              push    esi
001c7448 33f6            xor     esi,esi
001c744a 56              push    esi
001c744b 56              push    esi

FWIW, the address is the same on the other notepad.exe instance as well. Looking at !address output, this page is also a copy-on-write page:

0:001> !address 001c743f
Usage:                  Image
Base Address:           001c1000
End Address:            001ca000
Region Size:            00009000
State:                  00001000    MEM_COMMIT
Protect:                00000020    PAGE_EXECUTE_READ
Type:                   01000000    MEM_IMAGE
Allocation Base:        001c0000
Allocation Protect:     00000080    PAGE_EXECUTE_WRITECOPY
Image Path:             notepad.exe
Module Name:            notepad
Loaded Image Name:      C:WindowsSystem32notepad.exe
Mapped Image Name:
More info:              lmv m notepad
More info:              !lmi notepad
More info:              ln 0x1c743f
More info:              !dh 0x1c0000

However, let us see if this page has different physical addresses or not. Firstly, for the process with PID 0x910 (which has word wrap ON)

kd> !vtop 3f77b3c0 001c743f
X86VtoP: Virt 001c743f, pagedir 3f77b3c0
X86VtoP: PAE PDPE 3f77b3c0 – 000000000e0f5801
X86VtoP: PAE PDE e0f5000 – 000000000e0fa867
X86VtoP: PAE PTE e0fae38 – 0000000028d07025
X86VtoP: PAE Mapped phys 28d0743f
Virtual address 1c743f translates to physical address 28d0743f.

For the other process, it is as follows:

kd> !vtop 3f77b400 001c743f
X86VtoP: Virt 001c743f, pagedir 3f77b400
X86VtoP: PAE PDPE 3f77b400 – 000000000bd08801
X86VtoP: PAE PDE bd08000 – 000000000beeb867
X86VtoP: PAE PTE beebe38 – 0000000028d07025
X86VtoP: PAE Mapped phys 28d0743f
Virtual address 1c743f translates to physical address 28d0743f.

Voila! They are the same physical page. This is expected (in retrospect) because this code has not been changed and is therefore mapped to the same physical page 28d0743f.

Summary

Virtual to physical mapping is viewable in the kernel mode debug session using the !vtop extension
For global variables within the process, it is inevitable that though they share the same virtual address, their physical pages will be different due to the copy-on-write behavior
For executable code, the physical pages will be shared as long as the load address of the module was the same.

Well that’s it for now! If you liked this and want to see more ‘Proof of the pudding’, please take a minute to drop a comment, and I will be glad to oblige!

September 7, 2012 by Arvind Shyamsundar

Windows Performance Toolkit: Finding CPU consumer

A colleague of mine recently asked the question ‘I see CPU N is constantly pegged at 100%; is there any good way to determine which specific process is using that specific CPU?’

Well, he already figured that Task Manager or Process Explorer does not provide that information. And by definition purely in user mode one cannot find this information, because the OS dispatcher schedules threads, not processes and the dispatcher runs at a lower level than user mode threads do. So classically, a live kernel debug would be required to be sure. The extension command !pcr or !prcb will give you the current thread for that processor, which you can use with !thread to get the process object.)

BUT no customer will let us do a live kernel debug in production unless it is really necessary. There must be an easier way out!

Today, with XPerf (part of Windows Performance Toolkit) you can get this information fairly easily and accurately. Here’s how you can get started:

Download and install the Platform SDK (http://msdn.microsoft.com/en-us/windows/hardware/gg463009.aspx) and choose the install the Windows Performance Toolkit
Redistributable versions should get installed as well at C:Program Files (x86)Windows Kits8.0Windows Performance ToolkitRedistributables (notice the ARM target is now available as well!)
Use the Windows Performance Recorder utility to record a trace with just CPU in it. Save it to a (.ETL) file.
Collect data for not more than 30-45 seconds. A LOT of information is collected, so limit the duration.
Transfer and Double-click the .ETL file on your laptop, it should open up in the Windows Performance Analyzer (WPA) utility.
Then from the graph explorer on the left side you should be able to select the CPU timeline report
Select the CPU of interest, and ZOOM IN to a particular section of the timeline. The zooming is important as without that you are not told which process is utilizing that CPU.

Here is an example screenshot of the type of information that WPA displays. Click on the thumbnail to get a larger view.

As you can see, it is really detailed and extremely easy to use. If you have Windows 7, Windows Server 2008 R2, Windows 8 or Windows Server 2012, you should be able to leverage the WPT to the full extent.

For more information on WPT you can click the help links I have mentioned above, and for a more interactive discussion I recommend you view the BUILD 2011 videos around WPT / WPA:

There are some additional videos on Channel9 which cover specific scenarios where WPT / WPA can be very useful:

Please leave a comment and rate the post if you found it useful. I appreciate your time very much!

Happy performance investigations!

Inside the Dump File

Live Kernel Dumps

So What?

In Closing

Appendix: !analyze –v output

Analysis

Mitigation

SQL Server FileStream

Other notes

References

Scenario

Dump analysis

Root cause

Testing

Conclusion

Further Reading

Background

Prerequisite reading

Test setup: COM Server

Test setup: C# client

Observations

Tracking the unmanaged leak

Manually tracking the leak

Checking SysAllocString / SysFreeString pairs

Clueless?

Eureka!

Before the fix

After the fix

Conclusions

Recommendation

Prerequisites

Symbol locations

Setup

Ready to roll

Permissions

Summary

Further Reading

Background reading

Test Setup

Initial observations

Verification using live Kernel Debugging

Digging deeper

Summary