Rolling upgrade of 1 instance from 2-node, 2-instance sql failover cluster to slipstreamed sql 2008 SP1

Now that SQL Server 2008 SP1 is released (download link here), it is time to test a very interesting capability: Service Pack slipstreaming. That means that you can save some time by doing an ‘integrated’ installation of SQL 2008 which includes the SP1 binaries and avoid having to apply SP1 later on. Windows had this capability for some time but only now is it officially supported for SQL Server 2008.

Scenario

This walk-through was conducted on a 2-node, 2-SQL instance Windows 2003 x86 cluster. The 2 instances were:

  • VIRSQL2K – default instance of SQL Server 2000 SP4 (2039 build)
  • VIRSQL2K5SQL2K5 – named instance of SQL Server 2005 SP2 (3042 build)

Objective and Tools

We want to upgrade the SQL 2000 instance to SQL 2008, in-place. Currently we will not be touching the SQL 2005 instance (maybe I will do another blog post for that if there are any specific observations.) The objective was to minimize downtime of either instance while performing an in-place upgrade to SQL Server 2008 SP1. To achieve the objective, we use 2 new features which are available in SQL Server 2008:

Step-by-step

1. Pre-requisites

  • Install the Windows Installer 4.5 (found on SQL 2008 DVD) prior to your actual downtime. Do not reboot at this stage.
  • Install the hotfix for KB article 937444 (from http://support.microsoft.com/kb/937444). The SQL issue pertaining to this OS hotfix is documented in http://support.microsoft.com/kb/955828
  • Review all the cluster specific best practices documented in my previous post.
  • Install .NET Framework 3.5 SP1 (found on SQL 2008 DVD) prior to your actual downtime.
  • At a suitable window (you may couple this with your actual upgrade downtime) you need to reboot to complete the process.
  • Do these steps on both nodes. Take care to stagger the reboots mentioned above to avoid total non-availability of your SQL instances.

1. Prepare the merged (slipstreamed) media

In my case I was just dealing with x86 instances, so I optimized some steps from the blogs above

  • From SQL 2008 DVD, copy the x86 subfolder and the files from the root (setup.exe etc.), I did not copy the IA64 and x64 folders. I copied these to c:sql2008 on my local nodes
  • Extract the SQL 2008 SP1 contents to c:sql2008pcu on each of the nodes. To do this I used
    • SQLServer2008SP1-KB968369-x86-ENU.exe /x:c:sql2008pcu /q
  • I then copied setup.exe and setup.rll from the c:sql2008pcu folder to c:sql2008, overwriting the older ones
  • I then copied the FILEs (NOT the sub-folders, and also ignore the Microsoft.SQL.Chainer.PackageData.dll from c:sql2008pcux86) of the c:sql2008pcux86 folder, into the c:sql2008x86 folder.
    • If you accidentally copied Microsoft.SQL.Chainer.PackageData.dll (like I did once) you will receive an error message when you launch setup: ‘specified action LandingPage is not supported for the sql server patching operation’. (In fact the blog post from Peter also talks about this error.)
  • Peter’s original blog post and also the current one from Bob Ward mention copying the sqlsupport.msi files over. I did do that, though Peter mentioned it is no longer required. Copy the sqlsupport.msi from c:sql2008pcux86setup1033 to c:sql2008x86setup (note that there is no 1033 in the destination, that is not a typo.) You can overwrite the older version and you may note that the older version was actually larger than the new one!
  • At this time you are ready to roll.

2. Run setup on passive node

I ran setup from the command prompt, specifying the PCUSource by hand:

  • cd c:sql2008
  • setup.exe /Action=UPGRADE /PCUSource=c:sql2008pcu

Initially I select the SQL 2000 instance, which is active on the other node:

image

In the cluster security policy screen, you need to enter the service account domain group names. Now something related to this step caused a problem for later on, I will explain it later:

image

At the Upgrade Rules screen in setup, you can verify that we are slipstreaming:

image

Also in Upgrade Rules you might be warned that any other SQL instances active on this node will be restarted due to cluster resource DLL update. This is very important if you have not planned on those other instances being restarted. So you should note it and factor it into your upgrade plans:

image

Later the cluster upgrade report clearly tells us it is going to upgrade just this (passive) node:

image

Once again we check we are slipstreaming:

image

The rest of setup was fairly uneventful and at the end I checked the sqlservr.exe version on this (passive) node:

image

If you check the old SQL 2000 installation folder, you will note that the binaries and other folders have been cleaned up.

3. Change group for failed Full-Text Search resource

At this stage you may notice that the full-text search service cluster resource is either stopped or plain does not even show up. In my case I received the following message in cluster admin. I just moved out the fulltext resource into another cluster group for the moment.

image

4. Failover and watch the upgrade of system databases happen

At this stage, you are ready to failover the SQL 2000 instance from (let’s say) NodeA (which is still running SQL 2000 and has not been touched by upgrade process) to the NodeB (on which the SQL 2000 service has been upgraded, binaries-only, to SQL 2008). What is supposed to happen after the move group, is that the SQL 2000 instance is temporarily unavailable, SQL 2008 service starts up on NodeB, it then upgrades the system databases to SQL 2008 ‘format’ and also upgrades the user databases to SQL 2008 ‘format’. This is technically a point of ‘no-return’ from which roll back means reinstalling SQL 2000 and restoring from backups.

In my case, things did not go well initially, and on failover from NodeA to NodeB, SQL 2008 instance failed to come online on NodeB, restarted and couple of times and then flipped back to NodeA. Checking the event log showed this message:

initerrlog: Could not open error log file ”. Operating system error = 3(The system cannot find the path specified.).

I verified the startup parameters using Configuration Manager on NodeB and also double-checked using Enterprise Manager on NodeA, and they were correct. The only other possibility was a permissions issue, and I used Process Monitor from SysInternals to track down the issue. It turned out that my SQL 2008 service account (which was the same as what the 2000 version used) did not have access to a registry key. Then it dawned on me that the domain group membership I referred to previously, was not granted. To fix this was easy, used AD Users and Computers and added my service account into this group:

image

After this was fixed, I was able to move group again to NodeB at which stage the database upgrades happened and the instance was online. Here are some random snippets from the SQL errorlog at the time of upgrade, just to show you what happened under the hood:

  • 2009-04-11 04:08:06.43 Server Microsoft SQL Server 2008 (SP1) – 10.0.2531.0 (Intel X86)
  • 2009-04-11 04:08:17.14 spid7s Database ‘master’ running the upgrade step from version 654 to version 655.
  • 2009-04-11 04:08:18.84 spid8s Converting database ‘model’ from version 539 to the current version 655.
  • 2009-04-11 04:08:33.61 spid7s Database ‘master’ is upgrading script ‘sqlagent90_msdb_upgrade.sql’ from level 0 to level 2.
  • 2009-04-11 04:09:26.75 spid7s Recovery is complete.  This is an informational message only.  No user action is required.

Total time from start to finish: 1:01 minutes. That means that my instance was potentially unavailable just for this long. That is pretty impressive!

5. Run setup on previously active node

The process on NodeA was pretty much the same as described above. All of the steps described above are relevant, except for #4 above, because the databases are already upgraded once. The only thing you may come across if you have not handled the full-text search resource issue would be an error in setup: ‘The device is not ready’:

image

I am not 100% sure, but it seems related to the fact that full text was not moved out to another group. Moving it out fixed the above setup error on NodeA.

6. Verify setup

Just a casual check here using Management Studio (yes, the management tools get upgraded from Query Analyzer / Enterprise Manager to SSMS) and verify that we are indeed on SQL 2008 SP1:

image

Key Learnings

  • Read the blog posts mentioned above very carefully while preparing the merged drop
  • Watch out for the exact DLL to be excluded (Microsoft.SQL.Chainer.PackageData.dll).
  • Pre-create the domain group for SQL Server 2008 service account in the AD (only required if you are on Windows 2003)
  • Add the SQL 2000 service account to the SQL 2008 domain group before you run the setup program
  • Change group for the failed Full-Text Search resource from the SQL group to any other group
  • Due to resource DLL update, be prepared for a restart of any other SQL instance on the node you are upgrading (such as in a N-instance cluster)

I hope you enjoyed this blog post and if you do like it, please do take a second to rate this post and also leave a few comments if you can. See you later!