WARNING: Stack unwind information not available. Following frames may be wrong.

I’m sure if you have ever used the WinDbg family of debuggers sometime, you must have seen the above message. What does it really mean? How does it affect you?

Quick Background on stack operation

In x86, the way the stack is built up, the entry point of the function (a.k.a. prolog) generated by the compiler contains some key instructions including one which saves the previous value of the EBP register for that thread on the stack. The next instruction sets the EBP register to point to the current stack pointer (ESP).

What am I talking about? If you don’t know what EBP and ESP are, I recommend you take a quick look at the links in the reference section at the end.

Frame Pointer Omission

So in some cases the compiler may choose to omit the setting of the EBP register and may instead directly use ESP to reference locals and parameters. In such cases the EBP register (a.k.a. ‘frame pointer’) is deemed as ‘omitted’ and the generated code is called FPO (Frame Pointer Omission) code.

In such cases the debugger will typically complain that it cannot unwind the stack based on the EBP (unless it has symbol files which match the module whose function has FPO enabled). Under those cases it will emit the warning which is the title of this post.

Recommendation

  • For builds which have /Oy enabled, it is necessary to have symbol files to successfully reconstruct the stack.
  • In the real world the most common reason to encounter the above message is faulty symbol paths. Check your symbol path.
  • In cases where you can tolerate the overhead of setting up the frame pointer, leave FPO off (/Oy-). See the last reference in my post for something which Windows team has supposedly done regarding this.

References

YADCU – Yet another dump capture utility

The plethora of dump capture tools is amazing and sometimes confusing. But here is one from Mark Russinovich which looks interesting: ProcDump. Some unique capabilities I can see in this tool are things like CPU threshold based triggers, the ability to clone a process so that it is suspended for minimum time when dump is captured, ability to launch another image on the event trigger, etc. Take a look at it, it should add to your debugging toolbox!

The meaning of CID in output of kernel debugger commands

Recently someone asked me what was the real meaning of the ‘Cid’ field which appears in the output of commands such as !process and !thread in the kernel debugger (kd). Though from a practical perspective I was aware that these represent the Process ID and Thread ID, I was unsure of what Cid stands for. In course of a search, I found a public source which answers the question. Cid is short for CLIENT_ID which in turn is an undocumented structure.

The public source is a free PDF version of the excellent “Undocumented Windows 2000 Secrets: A Programmer’s Cookbook” book, which you can now find at http://undocumented.rawol.com/. I think this resource is a very useful one for all those interested in Windows Internals and debugging as well. Go take a look at it!

Quick Tip: vfbasics!_AVRF_EXCEPTION_LOG_ENTRY symbol not resolved?

I was debugging some issues with the help of Application Verifier and WinDbg. Since I was onsite I did not have any access to Microsoft’s private symbol servers, so I was using the public symbol server (http://msdl.microsoft.com/download/symbols). On executing the !avrf extension command in WinDbg I was presented with the following error message in WinDbg:

***    Your debugger is not using the correct symbols                 ***
***                                                                   ***
***    In order for this command to work properly, your symbol path   ***
***    must point to .pdb files that have full type information.      ***
***                                                                   ***
***    Certain .pdb files (such as the public OS symbols) do not      ***
***    contain the required information.  Contact the group that      ***
***    provided you with these symbols if you need this command to    ***
***    work.                                                          ***
***                                                                   ***
***    Type referenced: vfbasics!_AVRF_EXCEPTION_LOG_ENTRY                ***

It turns out that my WinDbg symbol path was as follows, and due to it pointing just to the public symbol server it was loading public symbols for vfbasics.dll:

0:001> .sympath
Symbol search path is: SRV*c:localsymbols*
http://msdl.microsoft.com/download/symbols
Expanded Symbol search path is: srv*c:localsymbols*http://msdl.microsoft.com/download/symbols
0:001> lml
start             end                 module name
00000000`76dc0000 00000000`76f68000   ntdll      (pdb symbols)          c:localsymbolsntdll.pdbFDAD9EE7D6E44F4F9672ECB401A802192ntdll.pdb
000007fe`f0e50000 000007fe`f0ebe000   verifier   (pdb symbols)          c:localsymbolsverifier.pdb43FCE2D63C4544F9B1C67110EB3406951verifier.pdb
000007fe`f1660000 000007fe`f1693000   vrfcore    (pdb symbols)          c:localsymbolsvrfcore.pdb751D23CCD6504794AF2F18C1E547FE371vrfcore.pdb
000007fe`f28e0000 000007fe`f292a000   vfbasics   (pdb symbols)          c:localsymbolsvfbasics.pdb1ABCDFEFF9F4602A7F055801457A7D61vfbasics.pdb

To resolve the issue, I explicitly pre-pended the path to private symbols for vfbasics (which is c:windowssystem32 or in general %WINDIR%System32:

.sympath c:windowssystem32*SRV*c:localsymbols*http://msdl.microsoft.com/download/symbols

.reload

ld vfbasics

0:001> lml
start             end                 module name
00000000`76dc0000 00000000`76f68000   ntdll      (export symbols)       C:WindowsSYSTEM32ntdll.dll
000007fe`f28e0000 000007fe`f292a000   vfbasics   (private pdb symbols)  C:WindowsSYSTEM32vfbasics.pdb

Then !avrf works just fine!

If you liked this post, please do rate it and try to leave some comments if you can!

t-SQL Anti-Pattern: Index Key Order and Query Expression Order

This is really not a T-SQL anti-pattern as much as it is a database design issue, but we see it so often that it’s worthwhile bringing it up and clarifying things.

For illustrating the scenario, let’s examine the table Person.Contact in the AdventureWorks database. It has 2 columns called FirstName and LastName. Let’s say an application frequently queries this table with these columns in the WHERE clause. The query looks like this:

SELECT ContactID from Person.Contact

WHERE FirstName = ‘Carla’ and LastName = ‘Adams’

 

In order to support the query for seeking to the data, we create this index:

create nonclustered index idx_contact_names on Person.Contact(FirstName, LastName)

 

Now, let’s say there’s another application which fires another query on this table, and that query looks like this:

SELECT ContactID from Person.Contact

WHERE LastName = ‘Alberts’ and FirstName = ‘Amy’

 

Notice the difference between the 2 queries: the predicate ordering in the expression is different. Now, for the problem: some developers will now create another index, with the column order as (LastName, FirstName). That is not required. If you view the execution plan for both the queries, you will notice that the index is being used!

 

image

If you end up creating 2 indexes for the above scenario, SQL Server will effectively use only one of them for queries such as the above. The other index will only add to the overhead of index maintenance required during DML operations. So it is a redundant index and as such should be dropped.

 

Conclusion

The predicate order is independent of the choice of index to be created / the choice of index being used. SQL Server will use the index which yields a good plan.

Tip of the day: “An attempt was made to load a program with an incorrect format” .NET P/INVOKE issue

The other day I was using a 3rd party utility which was built on the .NET platform. My primary work computer happens to be a x64 installation. So on this computer when I fired the utility up, and tried to perform some tasks it would error with a .NET Exception which basically had the following characteristics:

– Message: “An attempt was made to load a program with an incorrect format”

– Exception: System.BadImageFormatException

After some troubleshooting it turned out that this utility was trying to load a plain-old DLL (which exported some functions) presumably using P/Invoke. The DLL was built for 32-bit platforms. Now it turns out that by design a 64-bit process (the 3rd party utility would run as a 64-bit process owing to the 64-bit .NET runtime) would be prevented from loading a non-COM 32-bit DLL (32-bit COM DLLs are loaded in a DLLHOST.EXE surrogate when invoked from a 64-bit client process, BTW…) with the above exception.

To configure the utility to run as a 32-bit .NET process, it turns out you can use the CORFLAGS utility. You run it as follows to switch the 32-bit execution mode ON:

corflags utility.exe /32Bit+

To turn it off, just use /32Bit- in the above command line.

VISUAL STUDIO REMOTE DEBUGGER HISTORY

Let’s say you use the Visual Studio Remote Debugger extensively, and with a wide variety of remote targets. Very quickly the list of qualifiers (see the image below to understand what I refer to) can grow quickly with ‘noise’ items.

image

If you were curious about where these are stored, they happen to be under one of the sub-keys of the following registry key:

HKCUSoftwareMicrosoftVisualStudio9.0DebuggerPort MRU

Before you get any further ideas, I must include the statutory warning:

WARNING: Using Registry Editor incorrectly can cause serious problems that may require you to reinstall Windows. Microsoft cannot guarantee that problems resulting from the incorrect use of Registry Editor can be solved. Use Registry Editor at your own risk. If you edit the registry, click the following article number to view the article in the Microsoft Knowledge Base:

Description of the Microsoft Windows Registry

Rolling upgrade of 1 instance from 2-node, 2-instance sql failover cluster to slipstreamed sql 2008 SP1

Now that SQL Server 2008 SP1 is released (download link here), it is time to test a very interesting capability: Service Pack slipstreaming. That means that you can save some time by doing an ‘integrated’ installation of SQL 2008 which includes the SP1 binaries and avoid having to apply SP1 later on. Windows had this capability for some time but only now is it officially supported for SQL Server 2008.

Scenario

This walk-through was conducted on a 2-node, 2-SQL instance Windows 2003 x86 cluster. The 2 instances were:

  • VIRSQL2K – default instance of SQL Server 2000 SP4 (2039 build)
  • VIRSQL2K5SQL2K5 – named instance of SQL Server 2005 SP2 (3042 build)

Objective and Tools

We want to upgrade the SQL 2000 instance to SQL 2008, in-place. Currently we will not be touching the SQL 2005 instance (maybe I will do another blog post for that if there are any specific observations.) The objective was to minimize downtime of either instance while performing an in-place upgrade to SQL Server 2008 SP1. To achieve the objective, we use 2 new features which are available in SQL Server 2008:

Step-by-step

1. Pre-requisites

  • Install the Windows Installer 4.5 (found on SQL 2008 DVD) prior to your actual downtime. Do not reboot at this stage.
  • Install the hotfix for KB article 937444 (from http://support.microsoft.com/kb/937444). The SQL issue pertaining to this OS hotfix is documented in http://support.microsoft.com/kb/955828
  • Review all the cluster specific best practices documented in my previous post.
  • Install .NET Framework 3.5 SP1 (found on SQL 2008 DVD) prior to your actual downtime.
  • At a suitable window (you may couple this with your actual upgrade downtime) you need to reboot to complete the process.
  • Do these steps on both nodes. Take care to stagger the reboots mentioned above to avoid total non-availability of your SQL instances.

1. Prepare the merged (slipstreamed) media

In my case I was just dealing with x86 instances, so I optimized some steps from the blogs above

  • From SQL 2008 DVD, copy the x86 subfolder and the files from the root (setup.exe etc.), I did not copy the IA64 and x64 folders. I copied these to c:sql2008 on my local nodes
  • Extract the SQL 2008 SP1 contents to c:sql2008pcu on each of the nodes. To do this I used
    • SQLServer2008SP1-KB968369-x86-ENU.exe /x:c:sql2008pcu /q
  • I then copied setup.exe and setup.rll from the c:sql2008pcu folder to c:sql2008, overwriting the older ones
  • I then copied the FILEs (NOT the sub-folders, and also ignore the Microsoft.SQL.Chainer.PackageData.dll from c:sql2008pcux86) of the c:sql2008pcux86 folder, into the c:sql2008x86 folder.
    • If you accidentally copied Microsoft.SQL.Chainer.PackageData.dll (like I did once) you will receive an error message when you launch setup: ‘specified action LandingPage is not supported for the sql server patching operation’. (In fact the blog post from Peter also talks about this error.)
  • Peter’s original blog post and also the current one from Bob Ward mention copying the sqlsupport.msi files over. I did do that, though Peter mentioned it is no longer required. Copy the sqlsupport.msi from c:sql2008pcux86setup1033 to c:sql2008x86setup (note that there is no 1033 in the destination, that is not a typo.) You can overwrite the older version and you may note that the older version was actually larger than the new one!
  • At this time you are ready to roll.

2. Run setup on passive node

I ran setup from the command prompt, specifying the PCUSource by hand:

  • cd c:sql2008
  • setup.exe /Action=UPGRADE /PCUSource=c:sql2008pcu

Initially I select the SQL 2000 instance, which is active on the other node:

image

In the cluster security policy screen, you need to enter the service account domain group names. Now something related to this step caused a problem for later on, I will explain it later:

image

At the Upgrade Rules screen in setup, you can verify that we are slipstreaming:

image

Also in Upgrade Rules you might be warned that any other SQL instances active on this node will be restarted due to cluster resource DLL update. This is very important if you have not planned on those other instances being restarted. So you should note it and factor it into your upgrade plans:

image

Later the cluster upgrade report clearly tells us it is going to upgrade just this (passive) node:

image

Once again we check we are slipstreaming:

image

The rest of setup was fairly uneventful and at the end I checked the sqlservr.exe version on this (passive) node:

image

If you check the old SQL 2000 installation folder, you will note that the binaries and other folders have been cleaned up.

3. Change group for failed Full-Text Search resource

At this stage you may notice that the full-text search service cluster resource is either stopped or plain does not even show up. In my case I received the following message in cluster admin. I just moved out the fulltext resource into another cluster group for the moment.

image

4. Failover and watch the upgrade of system databases happen

At this stage, you are ready to failover the SQL 2000 instance from (let’s say) NodeA (which is still running SQL 2000 and has not been touched by upgrade process) to the NodeB (on which the SQL 2000 service has been upgraded, binaries-only, to SQL 2008). What is supposed to happen after the move group, is that the SQL 2000 instance is temporarily unavailable, SQL 2008 service starts up on NodeB, it then upgrades the system databases to SQL 2008 ‘format’ and also upgrades the user databases to SQL 2008 ‘format’. This is technically a point of ‘no-return’ from which roll back means reinstalling SQL 2000 and restoring from backups.

In my case, things did not go well initially, and on failover from NodeA to NodeB, SQL 2008 instance failed to come online on NodeB, restarted and couple of times and then flipped back to NodeA. Checking the event log showed this message:

initerrlog: Could not open error log file ”. Operating system error = 3(The system cannot find the path specified.).

I verified the startup parameters using Configuration Manager on NodeB and also double-checked using Enterprise Manager on NodeA, and they were correct. The only other possibility was a permissions issue, and I used Process Monitor from SysInternals to track down the issue. It turned out that my SQL 2008 service account (which was the same as what the 2000 version used) did not have access to a registry key. Then it dawned on me that the domain group membership I referred to previously, was not granted. To fix this was easy, used AD Users and Computers and added my service account into this group:

image

After this was fixed, I was able to move group again to NodeB at which stage the database upgrades happened and the instance was online. Here are some random snippets from the SQL errorlog at the time of upgrade, just to show you what happened under the hood:

  • 2009-04-11 04:08:06.43 Server Microsoft SQL Server 2008 (SP1) – 10.0.2531.0 (Intel X86)
  • 2009-04-11 04:08:17.14 spid7s Database ‘master’ running the upgrade step from version 654 to version 655.
  • 2009-04-11 04:08:18.84 spid8s Converting database ‘model’ from version 539 to the current version 655.
  • 2009-04-11 04:08:33.61 spid7s Database ‘master’ is upgrading script ‘sqlagent90_msdb_upgrade.sql’ from level 0 to level 2.
  • 2009-04-11 04:09:26.75 spid7s Recovery is complete.  This is an informational message only.  No user action is required.

Total time from start to finish: 1:01 minutes. That means that my instance was potentially unavailable just for this long. That is pretty impressive!

5. Run setup on previously active node

The process on NodeA was pretty much the same as described above. All of the steps described above are relevant, except for #4 above, because the databases are already upgraded once. The only thing you may come across if you have not handled the full-text search resource issue would be an error in setup: ‘The device is not ready’:

image

I am not 100% sure, but it seems related to the fact that full text was not moved out to another group. Moving it out fixed the above setup error on NodeA.

6. Verify setup

Just a casual check here using Management Studio (yes, the management tools get upgraded from Query Analyzer / Enterprise Manager to SSMS) and verify that we are indeed on SQL 2008 SP1:

image

Key Learnings

  • Read the blog posts mentioned above very carefully while preparing the merged drop
  • Watch out for the exact DLL to be excluded (Microsoft.SQL.Chainer.PackageData.dll).
  • Pre-create the domain group for SQL Server 2008 service account in the AD (only required if you are on Windows 2003)
  • Add the SQL 2000 service account to the SQL 2008 domain group before you run the setup program
  • Change group for the failed Full-Text Search resource from the SQL group to any other group
  • Due to resource DLL update, be prepared for a restart of any other SQL instance on the node you are upgrading (such as in a N-instance cluster)

I hope you enjoyed this blog post and if you do like it, please do take a second to rate this post and also leave a few comments if you can. See you later!

Debugging Toolbox

This one is a quickie for an easy reference to most commonly used debugging tools and links. I hope you find it useful, and kindly indicate your feedback on this page by using the comments section or by rating the post!

Debugging Toolbox

 

Tool

Key Usage Scenarios

Download location

WinDbg

Interactive production debugging

Dump analysis

Local kernel debugging

 

http://www.microsoft.com/whdc/devtools/debugging/default.mspx

CDB

Scriptable, low overhead capture of dumps

 

Low overhead interactive debugging

ADPlus

Scripted automation for capturing dumps

DebugDiag

Can setup rules for capturing dumps in an unattended, logged-off scenario

 

Automated analysis of basic crash, hang and leak scenarios

http://www.iis.net

SysInternals Process Monitor

Interactive process and thread level details

 

Handle usage

http://www.sysinternals.net

TLIST

Listing services

Listing process tree

Listing which processes have loaded a module

http://www.microsoft.com/whdc/devtools/debugging/default.mspx

Performance Monitor

Interactive display of performance data on the system (System Monitor)

 

Logged-off unattended capturing of performance logs

Shipped with the OS

Application Verifier

Enable various checks to trap deep rooted issues earlier such as

 

Orphaned critical sections

Heap corruption

Unsafe API usage

Simulate low memory conditions

 

http://msdn.microsoft.com/en-us/library/aa480483.aspx

http://www.microsoft.com/downloads/details.aspx?FamilyID=C4A25AB9-649D-4A1B-B4A7-C9D8B095DF18&displaylang=en

 

PREFast

Static source code analysis to detect potential buffer / stack overruns etc.

http://www.microsoft.com/whdc/devtools/tools/prefast.mspx

XPerf / XPerfInfo

Profiler / tracing of user mode applications

http://msdn.microsoft.com/en-us/library/cc305187.aspx

http://msdn.microsoft.com/en-us/performance/cc825801.aspx

http://blogs.msdn.com/seema/archive/2008/10/08/xperf-a-cpu-sampler-for-silverlight.aspx

 

UserDump

Automated capture of dumps

http://www.microsoft.com/downloads/details.aspx?FamilyID=e089ca41-6a87-40c8-bf69-28ac08570b7e&DisplayLang=en

 

LeakDiag

Leak diagnostics

ftp://ftp.microsoft.com/PSS/Tools/Developer%20Support%20Tools/LeakDiag

 

Native code debugging sites and blogs

http://www.advancedwindowsdebugging.com

http://blogs.msdn.com/ntdebugging

http://www.osronline.com

http://www.alex-ionscu.com

http://www.dumpanalysis.org

http://www.nynaeve.net/

http://blogs.thinktecture.com/ingo/archive/2006/08/05/414674.aspx

 

.NET articles:

http://msdn2.microsoft.com/en-us/library/ms954594.aspx

http://msdn.microsoft.com/msdnmag/issues/03/06/Bugslayer

http://msdn.microsoft.com/msdnmag/issues/05/03/Bugslayer

http://support.microsoft.com/kb/892277

http://msdn.microsoft.com/msdnmag/issues/05/07/Debugging/

http://msdn.microsoft.com/msdnmag/issues/06/11/CLRInsideOut/default.aspx

 

.NET debugging blogs:

http://blogs.msdn.com/yunjin/

http://blogs.msdn.com/tess/

http://blogs.msdn.com/jmstall/

http://blogs.msdn.com/mvstanton/ 

http://blogs.msdn.com/cbrumme/

http://blogs.msdn.com/maoni/

http://blogs.msdn.com/toddca/

http://blogs.msdn.com/suzcook/

http://www.debugtricks.com