Debugging managed code using VS.NET remote debugger

Visual Studio can be used to debug processes remotely, using the MSVSMON agent. This can be used for both native code and managed code. However, for successfully debugging managed code applications, symbol files have to be correctly loaded – and for that to happen, they must be in the correct place.

Prerequisites

For remote debugging of managed code to work correctly, the pre-requisites are:

  1. The Default transport needs to be used in Debug-Attach to process
  2. Firewall exceptions both ways
  3. Symbols need to be accessible from MSVSMON (target) – see note below
  4. Source will still be loaded on the client machine

 

Symbol locations

It is very important to keep in mind that VS.NET never loads mismatched symbols. So it is our responsibility to ensure matched PDBs are made available at one of the following locations on the remote machine:

  • At the location where the EXE is
  • On the path we specify in the VS.NET symbols dialog
  • _NT_SYMBOL_PATH
  • %SystemRoot%

Setup

Here is a schematic of the typical setup. Here, VS.NET acts as a controller, talking to MSVSMON (the remote debugger agent) which actually attaches to the process being debugged.

 

image

An additional challenge in our case is that the client and server are actually in different domains. To work around this issue, we create identically named local users (call it VSUSER) in the client and on the remote server. The password for this user is also kept the same on both machines. This technique is sometimes referred to as passthrough authentication or mirrored accounts.

Ready to roll

On the server and on the client, we now launch sessions logged in as VSUSER. You could also use the RUNAS command to obtain CMD prompts which are running as VSUSER. That will be used to launch the following:

  1. On the client side, we will launch DEVENV (VS.NET IDE) running as VSUSER
  2. On the server side, we will launch MSVSMON running as VSUSER
  3. On the server side we will launch the Windows application (running as any user.)

Next, we deploy the remote debugger, MSVSMON, which is found at the location C:Program Files (x86)Microsoft Visual Studio 10.0Common7IDERemote Debuggerx64. When you install and run this on the server side, you will most probably be prompted to unblock firewall ports:

image

The next step is to use the Debug-Attach to process menu on the client to discover the processes on the remote machine.

image

Permissions

If you try to remote debug a process which is not launched under the VSUSER identity, you may get this error message:

image 

To fix this issue, add VSUSER to the local administrators group on the server, and run MSVSMON as administrator (due to UAC). Once this is done, reattach to the process, you should get the following warning, and then you should be able to continue.

image

 

Summary

Remote debugging of managed processes (even those which are launched under another identity) is very much possible, as long as you use the Default transport, ensure authentication, keep communications open and most importantly – ensure that symbols are on the target at one of the locations specified in my above post. You can then resolve the source line numbers, and debug interactively!

 

Further Reading

How to- Run the Remote Debugging Monitor – Microsoft Corporation

Visual Studio 2008 – Remote Debugging with MSVSMON.EXE

That’s it for now! Please rate this post and leave your comments – much appreciated!

Advertisements

AlwaysOn Availability Group Forced Failover: Under the Hood

In this walkthrough I will take you through one aspect of FCI + AG configuration: forced failover. We will simulate failure and force failover. And in the process we will have some … Data Loss. Does that not sound horrible! Relax! You should not worry too much Smile, because in this case what we are going to discuss is actually a ‘manufactured’ data loss situation. The true intent of this post is to show you how the Availability Group architecture deals with ‘multiple master’ scenario and how it avoids the ‘split-brain’ condition.

Lab setup

For our testing the lab setup looks like this:

image

INST1 is a clustered instance of SQL with 2 nodes as possible owners, running SQL 2012 CU3. W2K8R2N3 is a server hosting a standalone instance of SQL 2012 CU3. All 3 servers are nodes in the same Windows Server Failover Cluster (WSFC), SQLAGCLUST.

Design decision #1: Asynchronous versus Synchronous secondary

We are allowed to configure either; but note the latency effects that synchronous secondary can have, especially if W2K8R2N3 was on a WAN link, I would prefer to use asynchronous in that case.

Design decision #2: Why not automatic failover for the secondary replica?

While configuring this setup, please note that while we are allowed to have a synchronous replica, it is not allowed to make that as automatic failover partner. If you try to configure W2K8R2N3 as a synchronous replica with automatic failover:

USE [master]
GO
ALTER AVAILABILITY GROUP [FCI_AG]
MODIFY REPLICA ON N’INST1′ WITH (PRIMARY_ROLE(READ_ONLY_ROUTING_LIST = (N’W2K8R2N3′)))
GO
ALTER AVAILABILITY GROUP [FCI_AG]
MODIFY REPLICA ON N’INST1′ WITH (FAILOVER_MODE = AUTOMATIC)
GO


When you do this, you will receive the following error message:

Msg 35215, Level 16, State 17, Line 1
The Alter operation is not allowed on availability replica ‘INST1’, because automatic failover mode is an invalid configuration on a SQL Server Failover Cluster Instance.  Retry the operation by specifying manual failover mode.

This is expected and by-design, the logic being that the FCI already has automatic failover configured between the nodes. The current implementation of AG doe not allow a failover outside of the FCI nodes.

 

Test Case

We will simulate a ‘perfect storm’ case – failure of the link to the DR, followed by failure of the primary site, followed by forced failover to the secondary site, and finally a failback to the primary site. In this rarest of rare scenarios we will show how the scope for data loss can arise.

Database and AG details

Here is the ‘full disclosure report’ for the test Smile

  • In our sample AG, there is just 1 database: ‘mydb’.
  • The primary replica INST1.mydb is a synchronous, readable secondary, with manual failover.
  • The database replica W2K8R2N3.mydb is synchronous, readable secondary with automatic failover.
  • For our testing we will use a simulated table ‘newtab’ which has 1 integer column, ‘i’.
  • Prior to these tests, the table ‘newtab’ was truncated and the ‘mydb’ transaction log has been backed up.
  • We will test for missing rows to determine the data loss.

Cluster node weight assignment

As part of the ‘good practices’ since W2K8R2N3 cannot participate in automatic failover, it is recommended to set it’s node weight to 0. By default it is 1:

PS C:Windowssystem32> (get-clusternode “w2k8r2n3”).Nodeweight
1

We will set it to 0:

PS C:Windowssystem32> (get-clusternode “w2k8r2n3”).Nodeweight = 0

Test 0: Ensure synchronization works normally

We insert our first row, and check the secondary replica to see if the log records made it across.

insert newtab values (1)

Almost immediately we can see the record is synchronized on the secondary replica. A casual inspection of the transaction log on both sides using fn_dblog() shows that the log records are in sync. Here are the last 3 entries displayed by fn_dblog:

0000001f:0000007b:0018    LOP_INSERT_ROWS    LCX_HEAP    0000:000002e9
0000001f:0000007b:0019    LOP_SET_FREE_SPACE    LCX_PFS    0000:00000000
0000001f:0000007b:001a    LOP_COMMIT_XACT    LCX_NULL    0000:000002e9

fn_dblog is undocumented, unsupported, but can be useful for learning and testing purposes. You can read more about it at the following community resource pages:

 

Test 1: Link failure / secondary down

This is simulated by stopping the service on W2K8R2N3. In this case, the primary replica (which is the FCI) runs ‘exposed’. ‘Exposed’ means that potentially the RPO of 0 data loss (for the AG) is not met because there is no link to the secondary replica.

At this time (when the secondary replica is stopped or disconnected) we see the following entry in the errorlog of primary replica INST1:

2012-09-20 11:52:42.880 spid44s      AlwaysOn Availability Groups connection with secondary database terminated for primary database ‘mydb’ on the availability replica with Replica ID: {ac95b2e1-d5fa-48c2-8995-2f7be4140ed3}. This is an informational message only. No user action is required.

Now, let’s continue ‘modifying’ data on INST1.mydb:

insert newtab values (2)

Let’s review the last few LSNs:

0000001f:00000081:0002    LOP_INSERT_ROWS    LCX_HEAP    0000:000002ec
0000001f:00000081:0003    LOP_COMMIT_XACT    LCX_NULL    0000:000002ec

So to summarize, at this time, we have 1 row (with a value of 2) in mydb.newtab which has not been transferred to W2K8R2N3. This is why we say we are ‘exposed’, because we have the potential of losing this change were something bad to happen to INST1 now.

Test 2: INST1 also fails; W2K8R2N3 comes up later

Imagine due to a steady degradation (maybe power issue or such) the INST1 SQL FCI instance also fails. We will demonstrate this by powering the N1 and N2 nodes off, so that way INST1 is unavailable. When these nodes are offline, the cluster itself fails.

Also imagine after a while, W2K8R2N3 (the erstwhile secondary) comes up. In such a case, let’s see what happens. Firstly, the failover cluster manager reports that it is unable to connect to our WSFC. So let’s see what we get when we generate the cluster log file:

C:Windowssystem32>cluster log /g
Generating the cluster log(s) …
The cluster log has been successfully generated on node ‘w2k8r2n3’…
The cluster log could not be created on node ‘W2K8R2N1’…
System error 1722 has occurred (0x000006ba).
The cluster log could not be created on node ‘W2K8R2N2’…
System error 1722 has occurred (0x000006ba).     

The multiple ‘errors’ are because the cluster log is attempted to be generated on all nodes. The Nodes 1 and 2 being down, these errors are expected. Let’s see what is in the cluster log on W2K8R2N3:

00000578.00000764::2012/09/20-19:10:39.769 INFO  [QUORUM] Node 3: setting next best epoch to 2012/09/13-06`03`57.603_11
00000578.00000764::2012/09/20-19:10:39.769 DBG   [QC] Calculating quorum: Configured nodes count = 3, Node weights count = 2, Quorum witness configured = false, Intersection count = 0
00000578.00000764::2012/09/20-19:10:39.769 INFO  [QUORUM] Node 3: Coordinator: no quorum yet. do nothing

Aha! So due to lack of quorum, our N3 cluster service is unable to form a cluster. At this stage, I will go ahead and stop the cluster service on N3 (if it has not stopped already.) We will start it and ‘force quorum’ later.

Test 3: Forcing quorum on W2K8R2N3

This part is quite simple in operation but underneath there are many complex details going on. I will not attempt to reproduce those details here but you can search for ‘Paxos Tag’ and also look at the 2 articles listed below to understand the mechanism:

So to force the cluster service to start and force a quorum, here are the commands. Firstly, we will use PowerShell to set the node weight of this W2K8R2N3 to 1:

PS C:Windowssystem32> (get-clusternode “w2k8r2n3”).NodeWeight  = 1

Next, we use the ForceQuorum switch to start the cluster service.

C:Windowssystem32>net start clussvc /fq

At this stage, if you look  at the SQL Management Studio, you will see this database in ‘Not Synchronizing’ state. Also the AG resource will state ‘Resolving’:

image

Next, we will force the failover of the AG to this node. To do this, we right click on the AG and select the Failover option, which in turn brings up this set of wizard screens. The first one warns us of the scope of data loss:

image

The second one ensures that we understand the implications of forcing failover:

image

Once the necessary steps are done, you will see the database status as Synchronized:

image

Forced failover: Under the hood

And most importantly from an internals perspective, looking at the transaction log using fn_dblog() shows the same record as the original modification (the row with value 1. The LSN is 1f:7b:18

0000001f:0000007b:0018    LOP_INSERT_ROWS    LCX_HEAP    0000:000002e9

At this time, we can also check the rows in the table on N3:

select * from mydb.dbo.newtab

The above shows just 1 row, with value 1 as expected. Let’s take a look at the error log on W2K8R2N3:

2012-09-20 12:32:38.110 spid53       The state of the local availability replica in availability group ‘FCI_AG’ has changed from ‘RESOLVING_NORMAL’ to ‘RESOLVING_PENDING_FAILOVER’. The replica state changed because of either a startup, a failover, a communication issue, or a cluster error.

Below you will see the state transition which is in response to our request to force failover:

2012-09-20 12:32:48.050 spid55       AlwaysOn: The local replica of availability group ‘FCI_AG’ is preparing to transition to the primary role in response to a request from the Windows Server Failover Clustering (WSFC) cluster. This is an informational message only. No user action is required
2012-09-20 12:32:48.060 spid55       The state of the local availability replica in availability group ‘FCI_AG’ has changed from ‘RESOLVING_PENDING_FAILOVER’ to ‘RESOLVING_NORMAL’. The replica state changed because of either a startup, a failover, a communication issue, or a cluster error.
2012-09-20 12:32:48.080 spid55       The state of the local availability replica in availability group ‘FCI_AG’ has changed from ‘RESOLVING_NORMAL’ to ‘PRIMARY_PENDING’. The replica state changed because of either a startup, a failover, a communication issue, or a cluster error.

2012-09-20 12:32:48.200 Server       The state of the local availability replica in availability group ‘FCI_AG’ has changed from ‘PRIMARY_PENDING’ to ‘PRIMARY_NORMAL’. The replica state changed because of either a startup, a failover, a communication issue, or a cluster error.

At this stage, the database can be brought online on W2K8R2N3:

2012-09-20 12:32:48.210 spid21s      Starting up database ‘mydb’.
2012-09-20 12:32:48.300 spid21s      3 transactions rolled forward in database ‘mydb’ (5:0). This is an informational message only. No user action is required.
2012-09-20 12:32:48.340 spid10s      0 transactions rolled back in database ‘mydb’ (5:0). This is an informational message only. No user action is required.  

Test 4: Bring INST1 back online

By doing this, we potentially risk what is known as a ‘split-brain’ situation. Normally the implementation of the AG resource DLL and the Lease Timeout avoids this issue IF it all nodes in the cluster can talk to each other. In the real DR failback scenario normally one would need to be very careful when bringing INST1 back online. The recommended way to prepare for this is to start the cluster services with the PreventQuorum switch first.

When we do bring the nodes for INST1 back online, it does put the mydb on INST1 in a Not Synchronizing state.

image

If we try to use the mydb database on INST1 it gives us the following error:

The target database, ‘mydb’, is participating in an availability group and is currently not accessible for queries. Either data movement is suspended or the availability replica is not enabled for read access. To allow read-only access to this and other databases in the availability group, enable read access to one or more secondary availability replicas in the group.  For more information, see the ALTER AVAILABILITY GROUP statement in SQL Server Books Online. (Microsoft SQL Server, Error: 976)

Let’s see if we can create a database snapshot to peer into what is inside INST1.mydb.

— Create the database snapshot
CREATE DATABASE mydb_dr_snap ON
( NAME = mydb, FILENAME =
‘f:sql2012datamydb_dr_snap.ss’ )
AS SNAPSHOT OF mydb;
GO
  

At this time, querying the snapshot using the following command:

select * from mydb_dr_snap.dbo.newtab

This gives us 2 rows back. That means the transactions on the erstwhile primary instance INST1 are still ‘there’ but because of the force quorum, the common LSN cannot be negotiated. This causes the issue of INST1 being in Not Synchronized state. You can see the proof for this in the error log. To compare the state of the old secondary replica with INST1, you can use TableDiff utility.

First, you will see the attempt made by the new secondary (INST1) to connect to the new primary (W2K8R2N3):

2012-09-20 12:40:34.990 spid23s      A connection for availability group ‘FCI_AG’ from availability replica ‘INST1’ with id  [280B1AED-49EA-4EF7-9BAF-C1EC13F50E51] to ‘W2K8R2N3’ with id [AC95B2E1-D5FA-48C2-8995-2F7BE4140ED3] has been successfully established.  This is an informational message
2012-09-20 12:40:37.630 spid29s      AlwaysOn Availability Groups connection with primary database established for secondary database ‘mydb’ on the availability replica with Replica ID: {ac95b2e1-d5fa-48c2-8995-2f7be4140ed3}. This is an informational message only. No user action is required.
2012-09-20 12:40:37.630 spid29s      Error: 35285, Severity: 16, State: 1.

This next entry is SO important for us to note. It is actually telling us that the ‘recovery LSN’ which is the ‘common handshake’ is 31:129:1. In hexadecimal notation, this is 1F : 81 : 1. From the first few steps in this walkthrough, you might remember that when we inserted row 2, the LSN was 1f: 81: 2.

2012-09-20 12:40:37.630 spid29s      The recovery LSN (31:129:1) was identified for the database with ID 5. This is an informational message only. No user action is required.
2012-09-20 12:40:37.630 spid29s      AlwaysOn Availability Groups data movement for database ‘mydb’ has been suspended for the following reason: “failover from partner” (Source ID 1; Source string: ‘SUSPEND_FROM_PARTNER’). To resume data movement on the database, you will need to resume the d  

Here is the notification that the snapshot was created:

2012-09-20 12:44:36.460 spid57       Starting up database ‘mydb_dr_snap’.
2012-09-20 12:44:36.470 spid57       4 transactions rolled forward in database ‘mydb_dr_snap’ (6:0). This is an informational message only. No user action is required.
2012-09-20 12:44:36.520 spid57       0 transactions rolled back in database ‘mydb_dr_snap’ (6:0). This is an informational message only. No user action is required.
2012-09-20 12:44:36.520 spid57       Recovery is writing a checkpoint in database ‘mydb_dr_snap’ (6). This is an informational message only. No user action is required.

Resume synchronization for INST1.mydb

Later, we resumed the synchronization for this database on INST1. That effectively ‘sets the clock back’ on INST1.mydb by rolling back some of the transaction which are subsequent to the recovery LSN. We do this by right clicking on the database under the Availability Group and selecting Resume Data Synchronization.

image

At this time select * from mydb.dbo.newtab gives 1 row back! That confirms the second of our ‘manufactured data loss’ scenarios Smile

As always let us look at what happened under the hood, by noting the error log entries on INST1. First, you will see the effect of the RESUME:

2012-09-20 12:48:26.410 spid55       ALTER DB param option: RESUME
2012-09-20 12:48:26.410 spid55       AlwaysOn Availability Groups data movement for database ‘mydb’ has been resumed. This is an informational message only. No user action is required.
2012-09-20 12:48:26.530 spid29s      AlwaysOn Availability Groups connection with primary database established for secondary database ‘mydb’ on the availability replica with Replica ID: {ac95b2e1-d5fa-48c2-8995-2f7be4140ed3}. This is an informational message only. No user action is required.
2012-09-20 12:48:26.530 spid29s      Error: 35285, Severity: 16, State: 1.
2012-09-20 12:48:26.530 spid29s      The recovery LSN (31:129:1) was identified for the database with ID 5. This is an informational message only. No user action is required.
2012-09-20 12:48:26.640 spid29s      Error: 35278, Severity: 17, State: 1.
2012-09-20 12:48:26.640 spid29s      Availability database ‘mydb’, which is in the secondary role, is being restarted to resynchronize with the current primary database. This is an informational message only. No user action is required.  

What follows is even more critical to note. It says we identified a higher LSN (31 : 130 : 1) but are ‘going back’ to the recovery LSN (31: 129: 1):

2012-09-20 12:48:26.790 spid40s      Error: 35285, Severity: 16, State: 1.
2012-09-20 12:48:26.790 spid40s      The recovery LSN (31:130:1) was identified for the database with ID 5. This is an informational message only. No user action is required.
2012-09-20 12:48:26.790 spid40s      Error: 35286, Severity: 16, State: 1.
2012-09-20 12:48:26.790 spid40s      Using the recovery LSN (31:129:1) stored in the metadata for the database with ID 5.

Finally, the database can be recovered and resume synchronization:

2012-09-20 12:48:26.960 spid40s      Starting up database ‘mydb’.
2012-09-20 12:48:27.000 spid40s      Starting up database ‘mydb_dr_snap’.
2012-09-20 12:48:27.420 spid40s      AlwaysOn Availability Groups connection with primary database established for secondary database ‘mydb’ on the availability replica with Replica ID: {ac95b2e1-d5fa-48c2-8995-2f7be4140ed3}. This is an informational message only. No user action is required.
2012-09-20 12:48:27.430 spid40s      Error: 35285, Severity: 16, State: 1.
2012-09-20 12:48:27.430 spid40s      The recovery LSN (31:129:1) was identified for the database with ID 5. This is an informational message only. No user action is required.
2012-09-20 12:48:27.430 spid40s      3 transactions rolled forward in database ‘mydb’ (5:0). This is an informational message only. No user action is required.
2012-09-20 12:48:27.460 spid29s      AlwaysOn Availability Groups connection with primary database established for secondary database ‘mydb’ on the availability replica with Replica ID: {ac95b2e1-d5fa-48c2-8995-2f7be4140ed3}. This is an informational message only. No user action is required.  

Here is the final state of transaction log on INST1. You will notice the absence of the LSN numbers associated with the row with value 2 (1f: 81: 2)

0000001f:0000007b:0019    LOP_SET_FREE_SPACE    LCX_PFS    0000:00000000
0000001f:0000007b:001a    LOP_COMMIT_XACT    LCX_NULL    0000:000002e9
0000001f:00000080:0001    LOP_MODIFY_ROW    LCX_BOOT_PAGE    0000:00000000
0000001f:00000081:0001    LOP_MODIFY_ROW    LCX_BOOT_PAGE    0000:00000000
0000001f:00000082:0001    LOP_MODIFY_ROW    LCX_BOOT_PAGE    0000:00000000
0000001f:00000083:0001    LOP_BEGIN_XACT    LCX_NULL    0000:000002ec
0000001f:00000083:0002    LOP_HOBT_DELTA    LCX_NULL    0000:000002ec

So this proves that we did lose the changes done in INST1 because of the forced failover to W2K8R2N3. However, with the database snapshot, we can attempt to reconcile those ‘lost’ changes with the base table.

Wrapping up

We covered quite of lot of deep stuff here. Here is a quick summary:

  1. Automatic failover is not possible when the primary replica is a FCI
  2. Normally because of this fact, you will assign 0 votes to the secondary replica nodes
  3. If you lose the primary site, you must bring up cluster service with force quorum on the secondary replica
  4. Next, you will force failover the AG to the secondary replica. Data loss might occur in this scenario.
  5. If you do get back the primary site in the mean time, use the Prevent Quorum switch on the erstwhile primary nodes to prevent ‘split-brain’ condition
  6. If you take the right steps and the earlier primary will be in Not Synchronizing state
  7. The earlier primary replica will still contain the changes which were made to it but not transmitted to the secondary site; you can use a database snapshot on the earlier primary replica to identify these changes, typically using an utility like TableDiff
  8. On resuming the earlier primary replica, it will don the role of a secondary. Again, in this case, data loss is expected if changes were made on the new primary which ‘override’ the earlier primary.

Wow! That was a long post but I hope it brings to light a lot of the detail around this very specific scenario of forced failover. I hope you enjoyed reading this. Please leave your comments, questions and feedback below. I appreciate those very much!

‘Proof of the pudding’: Global variables and PAGE_EXECUTE_WRITECOPY

Today I was teaching a debugging class to our customers. As a foundational element we normally review the virtual-to-physical address translation mechanism, at least to a basic level. In this class we decided to go a bit deeper and show the evidence Smile

Background reading

First, if you are not familiar with PTEs, PDEs and pages, you must read the following to understand those fundamental concepts:

 

Test Setup

My laptop hosts a Windows 2008 VM, using Hyper-V. The VM has been configured for kernel debugging using the following BCDEdit command:

bcdedit -debug on

We connect the kernel debugger on my laptop to this VM by using the ‘named pipe’ COM port emulation. The virtual COM port for the VM can be setup easily, and the steps for the same are documented here. Finally, the VM in turn has WinDbg installed. Since the VM is offline (no Internet connection,) the matching OS symbols were cached on the VM using this approach for offline debugging.

Initial observations

We initially launch two notepad.exe instances on the VM. In one instance, we will switch the ‘Word wrap’ setting to checked state; the other one we will leave as default. It so happens that this setting is mapped to a global variable notepad!fWrap. How did I figure that out? Using the following command in the VM WinDbg session:

0:001> x notepad!*wrap*
001ca034          notepad!fWrap = <no type information>

So here are the actual values for each of these processes:

notepad.exe #1:

0:001> x notepad!*wrap*
001ca034          notepad!fWrap = <no type information>
0:001> |
.  0    id: 910    create    name: notepad.exe   <—PID is 0x910
0:001> dd 001ca034 L1
001ca034  00000001        <—as you can see, the value is set to 1, which indicates word wrap is ON.

notepad.exe #2:

0:001> x notepad!fWrap
001ca034          notepad!fWrap = <no type information>
0:001> |
.  0    id: f6c    create    name: notepad.exe         <— PID is 0xf6c
0:001> dd 001ca034 L1
001ca034  00000000       <— In this case word wrap is OFF.

Notice that the virtual address space for fWrap is the same in both processes. So the question in the mind of the participant was, while theoretically we know that the address space is translated to different pages, can we prove it?

Verification using live Kernel Debugging

To answer the above question, the only way to be absolutely sure was to hook up a live kernel debugger session and check. Firstly, we identify the processes in the kernel mode (we need to process objects in order to do this correctly.)

kd> !process 0 0 notepad.exe
PROCESS 83b73d90  SessionId: 1  Cid: 0910    Peb: 7ffde000  ParentCid: 0c24
    DirBase: 3f77b3c0  ObjectTable: 945cd440  HandleCount:  46.
    Image: notepad.exe

PROCESS 8366eb38  SessionId: 1  Cid: 0f6c    Peb: 7ffda000  ParentCid: 0d9c
    DirBase: 3f77b400  ObjectTable: 944e46d0  HandleCount:  46.
    Image: notepad.exe

Next, we use the extension !vtop to map these to physical addresses. In the command below, the first argument to !vtop is the ‘page directory base’, the second is the virtual address. Firstly, for the process with PID 0x910 (which has word wrap ON) and whose Directory Base is 3f77b3c0 (obtained from the above !process output.)

kd> !vtop 3f77b3c0 001ca034
X86VtoP: Virt 001ca034, pagedir 3f77b3c0
X86VtoP: PAE PDPE 3f77b3c0 – 000000000e0f5801
X86VtoP: PAE PDE e0f5000 – 000000000e0fa867
X86VtoP: PAE PTE e0fae50 – 800000000c57a867
X86VtoP: PAE Mapped phys c57a034
Virtual address 1ca034 translates to physical address c57a034.

So !vtop says the physical address is c57a034. Notice the last 3 digits (which is the offset within the 4Kb page) is 034, which is the same as offset the virtual address. This is no coincidence, it is the way we break up the 32-bit virtual address into its constituents (page directory entry, page table entry and page offset.) So let us quickly verify the contents of physical memory location c57a034. To do that we have two options:

kd> !dd c57a034 L1
# c57a034 00000001

kd> dd /p c57a034 L1
0c57a034  00000001

Either way, we can be sure that this is THE content of our fWrap variable. Let us cross-check the other process as well in a similar way.

kd> !vtop 3f77b400 001ca034
X86VtoP: Virt 001ca034, pagedir 3f77b400
X86VtoP: PAE PDPE 3f77b400 – 000000000bd08801
X86VtoP: PAE PDE bd08000 – 000000000beeb867
X86VtoP: PAE PTE beebe50 – 800000000bc2c867
X86VtoP: PAE Mapped phys bc2c034
Virtual address 1ca034 translates to physical address bc2c034.

Next, let us dump the content of physical address bc2c034. Again we can use either !dd or dd /p to do this.

kd> dd /p bc2c034 L1
0bc2c034  00000000


kd> !dd bc2c034 L1
# bc2c034 00000000

So we can confirm that this is accurate as well, as it rightly shows the status value of 0.

Digging deeper

Now, let’s look at some more details. From the user mode WinDbg instance (running inside the VM) let’s check what kind of memory allocation is associated with the address notepad!fWrap. To do this, we will use the !address extension.

0:001> !address 001ca034

Usage:                  Image
Base Address:           001ca000
End Address:            001cb000
Region Size:            00001000
State:                  00001000    MEM_COMMIT
Protect:                00000004    PAGE_READWRITE
Type:                   01000000    MEM_IMAGE
Allocation Base:        001c0000
Allocation Protect:     00000080    PAGE_EXECUTE_WRITECOPY
Image Path:             notepad.exe
Module Name:            notepad
Loaded Image Name:      C:WindowsSystem32notepad.exe
Mapped Image Name:     
More info:              lmv m notepad
More info:              !lmi notepad
More info:              ln 0x1ca034
More info:              !dh 0x1c0000

Very interesting! So this is actually Image memory (as evidenced by the MEM_IMAGE type) and also the page is protected as a ‘copy-on-write’ page (PAGE_EXECUTE_WRITECOPY protection). This is actually the standard technique used for global variables within an image, which of course need to be uniquely maintained per-process.

Let’s turn our attention to the executable code within notepad, to see how that is mapped to physical addresses. We will randomly focus our attention on the function notepad!FastReplaceAll, which I gathered from the x notepad!* command output. notepad!FastReplaceAll is a function at address 001c743f in my debugging session:

0:001> u notepad!FastReplaceAll
notepad!FastReplaceAll:
001c743f 8bff            mov     edi,edi
001c7441 55              push    ebp
001c7442 8bec            mov     ebp,esp
001c7444 83ec24          sub     esp,24h
001c7447 56              push    esi
001c7448 33f6            xor     esi,esi
001c744a 56              push    esi
001c744b 56              push    esi

FWIW, the address is the same on the other notepad.exe instance as well. Looking at !address output, this page is also a copy-on-write page:

0:001> !address 001c743f
Usage:                  Image
Base Address:           001c1000
End Address:            001ca000
Region Size:            00009000
State:                  00001000    MEM_COMMIT
Protect:                00000020    PAGE_EXECUTE_READ
Type:                   01000000    MEM_IMAGE
Allocation Base:        001c0000
Allocation Protect:     00000080    PAGE_EXECUTE_WRITECOPY
Image Path:             notepad.exe
Module Name:            notepad
Loaded Image Name:      C:WindowsSystem32notepad.exe
Mapped Image Name:     
More info:              lmv m notepad
More info:              !lmi notepad
More info:              ln 0x1c743f
More info:              !dh 0x1c0000

However, let us see if this page has different physical addresses or not. Firstly, for the process with PID 0x910 (which has word wrap ON)

kd> !vtop 3f77b3c0 001c743f
X86VtoP: Virt 001c743f, pagedir 3f77b3c0
X86VtoP: PAE PDPE 3f77b3c0 – 000000000e0f5801
X86VtoP: PAE PDE e0f5000 – 000000000e0fa867
X86VtoP: PAE PTE e0fae38 – 0000000028d07025
X86VtoP: PAE Mapped phys 28d0743f
Virtual address 1c743f translates to physical address 28d0743f.

For the other process, it is as follows:

kd> !vtop 3f77b400 001c743f
X86VtoP: Virt 001c743f, pagedir 3f77b400
X86VtoP: PAE PDPE 3f77b400 – 000000000bd08801
X86VtoP: PAE PDE bd08000 – 000000000beeb867
X86VtoP: PAE PTE beebe38 – 0000000028d07025
X86VtoP: PAE Mapped phys 28d0743f
Virtual address 1c743f translates to physical address 28d0743f.

Voila! They are the same physical page. This is expected (in retrospect) because this code has not been changed and is therefore mapped to the same physical page 28d0743f.

Summary

  • Virtual to physical mapping is viewable in the kernel mode debug session using the !vtop extension
  • For global variables within the process, it is inevitable that though they share the same virtual address, their physical pages will be different due to the copy-on-write behavior
  • For executable code, the physical pages will be shared as long as the load address of the module was the same.

Well that’s it for now! If you liked this and want to see more ‘Proof of the pudding’, please take a minute to drop a comment, and I will be glad to oblige!

SQL collation and performance

Recently a colleague asked me if SQL collations have any impact on performance. We also hit upon another question: do local variables (DECLARE @somevar <datatype>) have a collation associated with them and if so how is it controlled?

Let us take some simple examples to demonstrate these points and derive conclusions.

Create databases

First, we are setting up 2 databases (inspired by SQL Books Online, these are aptly named GreekDB and LatinDB Smile).

— Setup databases first
create database greekdb
collate greek_ci_as
go

create database latindb
collate latin1_general_cs_as
go

Create tables with dummy data

Next, we are creating some tables in them, inserting data and then building indexes. First, we do the GreekDB ones:

use greekdb
create table greektable
(
    greekstr varchar(50)
)

insert greektable values (‘hello world’)

create index nc_greek
on greektable (greekstr)

Then the LatinDB ones:

use latindb
create table latintable
(
    latinstr varchar(50)
)

insert latintable values (‘hello world’)

create index nc_latin
on latintable (latinstr)

Query the tables

We start by querying these tables and observing the execution plans (I am using text showplan for ease of copying into this blog post.)

Query 1: Best case scenario

use greekdb
select *
from greektable
where greekstr = ‘Hello world’

Here is the execution plan:

  |–Index Seek(OBJECT:([greekdb].[dbo].[greektable].[nc_greek]), SEEK:([greekdb].[dbo].[greektable].[greekstr]=[@1]) ORDERED FORWARD)

Observations: Fairly obvious results, no collation mismatch, everything is fine, index gets used to seek.

Query 2: Cross database query with collation mismatch

use latindb
select *
from greekdb.dbo.greektable
where greekstr = ‘Hello world’

Here is the execution plan:

  |–Index Seek(OBJECT:([greekdb].[dbo].[greektable].[nc_greek]), SEEK:([greekdb].[dbo].[greektable].[greekstr]=CONVERT_IMPLICIT(varchar(8000),[@1],0)) ORDERED FORWARD)

Observations: Before you read further, keep in mind that USE DATABASE statement, referencing LatinDB. The rest of the query operates on greekdb. Looking at the execution plan, the implicit conversion of the string literal is interesting. String literals for a batch inherit the database collation (click on the previous link and check the Remarks section) if a COLLATE clause is not specified. That is what we are seeing here, the string literal ‘hello world’ defaults to the collation of the database, which is in this case is specified as LatinDB. Due to this, the string literal is converted to the collation of the column. The index seek can still be used because the column is not being converted.

Query 3: Usage of COLLATE to specify string literal collation

use latindb
select *
from greekdb.dbo.greektable
  where greekstr = ‘Hello world’ COLLATE greek_ci_as

Here is the execution plan:

  |–Index Seek(OBJECT:([greekdb].[dbo].[greektable].[nc_greek]), SEEK:([greekdb].[dbo].[greektable].[greekstr]=CONVERT(varchar(8000),[@1],0)) ORDERED FORWARD)

Observations: You see in this case that specifying COLLATE has no effect on the query plan, except that the CONVERT_IMPLICIT changes to a CONVERT (explicit.) The string literal still is coerced into the new collation (GREEK_CI_AS.) Please refer to the collation precedence rules for details (click on the previous link and check the ‘Collation rules’ section) on this.

Query 4: Explicit-explicit coercion is not allowed

select * from greekdb.dbo.greektable G
join latindb.dbo.latintable L
on G.greekstr = L.latinstr

Result: Msg 468, Level 16, State 9, Line 4
Cannot resolve the collation conflict between “Latin1_General_CS_AS” and “Greek_CI_AS” in the equal to operation

Observations: This should be obvious from the previous collation rules (if you clicked on the previous link).

Query 5: Usage of COLLATE to specify column collation

use latindb
select * from greekdb.dbo.greektable G
join latindb.dbo.latintable L
on G.greekstr collate latin1_general_cs_as = L.latinstr

Here is the execution plan:

|–Nested Loops(Inner Join, OUTER REFERENCES:([Expr1006]))
     |–Compute Scalar(DEFINE:([Expr1006]=CONVERT(varchar(50),[greekdb].[dbo].[greektable].[greekstr] as [G].[greekstr],0)))
     |    |–Index Scan(OBJECT:([greekdb].[dbo].[greektable].[nc_greek] AS [G]))
     |–Index Seek(OBJECT:([latindb].[dbo].[latintable].[nc_latin] AS [L]), SEEK:([L].[latinstr]=[Expr1006]) ORDERED FORWARD)

Observations: This query which is almost the same as previous one, works, while the previous errors out. The reason should be obvious: the explicit conversion of the Greek table columns to the Latin collation enables this to work. While this sounds good, there is potential for data loss. From this link, you can see the dreaded words: ‘Data loss during code page translations is not reported.’

Query 6: Explicit collation conversion on column

use greekdb
select *
from greekdb.dbo.greektable
where greekstr collate latin1_general_cs_ai = ‘Hello world’

Here is the execution plan:

  |–Index Scan(OBJECT:([greekdb].[dbo].[greektable].[nc_greek]),  WHERE:(CONVERT(varchar(50),[greekdb].[dbo].[greektable].[greekstr],0)=CONVERT_IMPLICIT(varchar(8000),[@1],0)))

Observations: In this query, which is kind of the same to the previous query, we convert the Greek column to a 3rd collation (accent-insensitive version of the LatinDb database collation.) The result is that underlying index is not used, causing a scan of the table. The other interesting thing to observe is the collation conversion of the string literal as well, which allows this comparison to happen.

Query 7: COLLATE specified with same collation

use greekdb
select *
from greekdb.dbo.greektable
where greekstr collate greek_ci_as = ‘Hello world’

Here is the execution plan:

  |–Index Seek(OBJECT:([greekdb].[dbo].[greektable].[nc_greek]), SEEK:([greekdb].[dbo].[greektable].[greekstr]=[@1]) ORDERED FORWARD)

Observations: This is the ‘best case’ all over again. The explicit COLLATE clause is effectively ignored by the engine. The index is used.

Query 8: Data type mismatch in case insensitive collation database

use greekdb
select *
from greekdb.dbo.greektable
where greekstr = N’Hello world’

Here is the execution plan:

  |–Nested Loops(Inner Join, OUTER REFERENCES:([Expr1006], [Expr1007], [Expr1005]))
       |–Compute Scalar(DEFINE:(([Expr1006],[Expr1007],[Expr1005])=GetRangeThroughConvert([@1],[@1],(62))))
       |    |–Constant Scan
       |–Index Seek(OBJECT:([greekdb].[dbo].[greektable].[nc_greek]), SEEK:([greekdb].[dbo].[greektable].[greekstr] > [Expr1006] AND [greekdb].[dbo].[greektable].[greekstr] < [Expr1007]),  WHERE:(CONVERT_IMPLICIT(nvarchar(50),[greekdb].[dbo].[greektable].[greekstr],0)=[@1]) ORDERED FORWARD)

Observations: This case is more complex. It involves specifying a string literal of type NVARCHAR, which is higher in precedence compared to our column data type of VARCHAR. The most important thing here is that the column ‘greekstr’ has to be implicitly converted to NVARCHAR(50). Also of note is that one row is returned by the SELECT query. The other thing of great interest is the usage of the GetRangeThroughConvert() function. Some details about this internal function are here and here, essentially you want to think of this as an optimization applied when SQL has to deal with an implicit conversion in the predicate, and enables the index to be ‘seeked’ (though there is a lot of work done before that seek can happen.)

Query 9: Data type mismatch in case-sensitive collation database

use latindb
select *
from latindb.dbo.latintable
where latinstr = N’Hello world’

Here is the execution plan:

|–Nested Loops(Inner Join, OUTER REFERENCES:([Expr1006], [Expr1007], [Expr1005]))
     |–Compute Scalar(DEFINE:(([Expr1006],[Expr1007],[Expr1005])=GetRangeThroughConvert([@1],[@1],(62))))
     |    |–Constant Scan
     |–Index Seek(OBJECT:([latindb].[dbo].[latintable].[nc_latin]), SEEK:([latindb].[dbo].[latintable].[latinstr] > [Expr1006] AND [latindb].[dbo].[latintable].[latinstr] < [Expr1007]),  WHERE:(CONVERT_IMPLICIT(nvarchar(50),[latindb].[dbo].[latintable].[latinstr],0)=[@1]) ORDERED FORWARD)

Observations: On the face of it, this query is very similar to the previous query. The main difference is that we are operating on a case-sensitive database. No rows are returned. Similar to the previous case, you see the effect of the GetRangeThroughConvert() internal function to enable the seek despite the data type mismatch.

Query 10: Case-insensitive collation specified

use latindb
select *
from latindb.dbo.latintable
where latinstr collate latin1_general_ci_as = N’Hello world’

Here is the execution plan:

  |–Index Scan(OBJECT:([latindb].[dbo].[latintable].[nc_latin]),  WHERE:(CONVERT_IMPLICIT(nvarchar(50),CONVERT(varchar(50),[latindb].[dbo].[latintable].[latinstr],0),0)=CONVERT_IMPLICIT(nvarchar(4000),[@1],0)))

Observations: This query is very similar to the previous ones, except that it has an explict COLLATE applied to the column. The query does return one row. You can see that not only is the latinstr column being converted (explicit CONVERT) to the same collation, it is later being implicitly converted to NVARCHAR. The other interesting aspect is that the string literal is also being converted to the same collation, which is necessary for a correct result.

Query 11: Effect of database collation on variable declaration

use greekdb — now this becomes the database whose collation will be used for all variables
declare @mygreekvar varchar(50)

use latindb
declare @mylatinvar varchar(50)  — you would think this is latin1_general_cs_as. Let’s see later.

set @mygreekvar = ‘hello world’

set @mylatinvar = ‘hello world’

if (@mygreekvar = @mylatinvar)
    select ‘they are the same’
else
    select ‘different’

use greekdb
select *
from greektable
where greekstr  = @mygreekvar

use latindb
select *
from latindb.dbo.latintable
where latinstr = @mylatinvar

Execution plan for the first SELECT:

  |–Index Seek(OBJECT:([greekdb].[dbo].[greektable].[nc_greek]), SEEK:([greekdb].[dbo].[greektable].[greekstr]=[@mygreekvar]) ORDERED FORWARD)

Execution plan for the second SELECT:

  |–Table Scan(OBJECT:([latindb].[dbo].[latintable]), WHERE:([latindb].[dbo].[latintable].[latinstr]=CONVERT_IMPLICIT(varchar(50),[@mylatinvar],0)))

Observations: Hopefully by now, you should have figured this one out. The collation applied to variables is actually that of the last USE database prior to the first DECLARE. Read more about that behavior here (opens up to Microsoft Connect.) So the @mylatinvar variable has to be converted to suit the latintable collation.

The key point I wish to call out here is that the usage of the COLLATE clause is potentially (see point a) below for why) going to cause a conversion of the column. The effect will depend on two things:

a. Whether the target collation is the same as the column collation or not. If the target is the same then there is no scalar conversion required.

AND

b. Whether this COLLATE clause is present in the predicate (such as WHERE clause or JOIN predicate). If it is in the predicate then it can cause a scan (if condition a) above has also been satisfied.)

c. ALSO

d. If the COLLATE is present in the output list (not in the predicate) and if condition a) has been satisfied then it will cause a scalar conversion operator to be introduced into the plan. Depending on how many such conversions (rows * columns) it may cause additional CPU overhead.

To summarize

PHEW! This is a rather long post, but exposes some important things:

  1. Collation mismatches can be dangerous – worst case, they can cause data loss or errors to occur (if coercion is disallowed) or in many cases can (at the very least) cause performance problems.
  2. The usage of the COLLATE clause can introduce a potential performance penalty. If this clause is applied on top of a table column inside a predicate, it may mean that any indexes might not be used (or we may have to do a lot of work using the GetRangeThroughConvert() function.)
  3. Lastly, if you are using cross-database references in a batch, beware of the collation assigned to each local variable – they inherit the same database collation of the last USE database prior to the first DECLARE.

Thank you for reading. Questions, comments are most welcome!

Hosting ReportServer catalog DBs on older version of SQL database engine

A FAQ we get is whether it is supported to have the ReportServer and ReportServerTempDB (catalog databases used in SSRS) on an older version of SQL Server (as compared to the version used on SSRS.) Here is the summary answer along with the reference links:

For SSRS 2012, it is supported to install the catalog databases on versions all the way down to SQL 2005. Reference: http://msdn.microsoft.com/en-us/library/ms157285.aspx

For SSRS 2008 & R2, the same story – catalog DB can be all the way down to SQL 2005. Reference http://msdn.microsoft.com/en-us/library/ms157285(v=sql.105).aspx. There is a ‘other versions’ link on this page, use that to navigate to the reference for SQL 2008 and 2005 as well.

One topic related to this is when you have SSRS 2008 R2 in the SharePoint 2010 ‘Connected Mode’. Please refer to the blog post http://blogs.msdn.com/b/prash/archive/2010/04/26/integrating-ssrs-r2-rtm-with-sharepoint-2010-rtm.aspx for more details on this scenario.

Thank you for reading!

Arvind.

AlwaysOn Availability Groups Connectivity Cheat Sheet

Today we were discussing the AlwaysOn connectivity options, and the various combinations therein of Readable Secondary setting and the connection strings became very complex. So I have summarized all the possible outcomes in the pivot tables below.

For convenience, we have two tables. The first table covers the cases when the Primary Replica is set to allow all types of connections (i.e. those with and without the ApplicationIntent=ReadOnly attribute.)

Case 1: Primary Replica allows ALL connections

Readable Secondary Setting

 

Additional Connection Parameters

Initial Database

No

Yes

Read-intent

 

Through Listener

None

Non-AG

Primary

Primary

Primary

ReadOnly

Primary

Primary

Primary

None

AG

Primary

Primary

Primary

ReadOnly

Not allowed (5)

Secondary

Secondary

Direct to Primary

None

Not applicable / don’t care

Primary

Primary

Primary

ReadOnly

Primary

Primary

Primary

Direct to Secondary

None

Not allowed (4)

Secondary

Not allowed (1)

ReadOnly

Not allowed (4)

Secondary

Secondary

The second table covers the cases when the Primary Replica is set to allow Read/Write (i.e. those without the ReadOnly attribute.)

Case 2: Primary Replica allows only Read/Write connections

Readable Secondary Setting

 

Additional Connection Parameters

Initial Database

No

Yes

Read-intent

 

 

Through Listener

None

Non-AG

Primary

Primary

Primary

ReadOnly

Not allowed (2)

Not allowed (2)

Not allowed (2)

None

AG

Primary

Primary

Primary

ReadOnly

Not Allowed (3)

Secondary

Secondary

Direct to Primary

None

Not applicable / don’t care

Primary

Primary

Primary

ReadOnly

Not allowed (2)

Not allowed (2)

Not allowed (2)

Direct to Secondary

None

Not allowed (4)

Secondary

Not allowed (1)

ReadOnly

Not allowed (4)

Secondary

Secondary

For some cases, the access to the database is not allowed (either at connection time or at run time, when the database is tried to be accessed) with the following error messages:

1. Error message: The target database (‘salesdb’) is in an availability group and is currently accessible for connections when the application intent is set to read only. For more information about application intent, see SQL Server Books Online.

2. Error message: The target database (‘SalesDb’) is in an availability group and currently does not allow read only connections. For more information about application intent, see SQL Server Books Online.

3. Error message: Unable to access the ‘SalesDB’ database because no online secondary replicas are enabled for read-only access. Check the availability group configuration to verify that at least one secondary replica is configured for read-only access. Wait for an enabled replica to come online, and retry your read-only operation.

4. Error message: The target database, ‘salesdb’, is participating in an availability group and is currently not accessible for queries. Either data movement is suspended or the availability replica is not enabled for read access. To allow read-only access to this and other databases in the availability group, enable read access to one or more secondary availability replicas in the group. For more information, see the ALTER AVAILABILITY GROUP statement in SQL Server Books Online

5. Error message: Unable to access the ‘SalesDB’ database because no online secondary replicas are enabled for read-only access. Check the availability group configuration to verify that at least one secondary replica is configured for read-only access. Wait for an enabled replica to come online, and retry your read-only operation.

This table should help you predict how the connectivity will behave in each of these combinations, and therefore setup your Availability Group replica properties correctly.

More information:

I hope you enjoyed this post as well. If you would like to see more, please leave a comment!

AlwaysOn Availability Groups Listener: connectivity details

Today in some discussions with my colleagues we were looking at the AlwaysOn Listener which allows SQL Server client applications to be abstracted from the details of the AlwaysOn Availability Group implementation. On the surface, The Listener essentially is a cluster resource pair – a Virtual IP and a Virtual Network Name. But on deeper examination, there are some details which are worth knowing. This blog post will help you discover those!

Firstly, here is an example of how the Listener looks like in the Cluster Management. You can see the network name resource (mylisten) and the virtual IP as well (192.168.1.78.)

image

And in Management Studio, you can see the same details as well:

image

In many diagrams the Listener is shown as a separate entity, distinct from the SQL database engine. In some ways this is correct, as you might interpret from the above virtual IP and network name. However, we have to consider the endpoint as well. If you think deeply, the question which might come up in your mind is, fine, this is a network name and a virtual IP; the cluster service helps ‘online’ these on the active node (it actually ‘brings online’ the virtual IP by registering it against the network interface of the active node) – but where is the listener port being bound and listened on? The cluster service cannot do that for us.

So here’s how you can investigate the actual mechanism. Firstly, do a netstat –ano on the primary replica (which is where the listener is active) and find out which process is listening on the listener’s IP and port. In our case these dummy address and port values are 192.168.1.78 and 1433 respectively. On checking, we see:

C:>netstat -ano

Active Connections

  Proto  Local Address          Foreign Address        State           PID

  TCP    192.168.1.78:1433      0.0.0.0:0              LISTENING       1232

The PID 1232 is actually the PID of sqlservr.exe! That means the listener functions are actually also implemented in the database engine. FYI, here is how we can check which process is PID 1232:

C:>tasklist /fi “pid eq 1232”

Image Name                     PID Session Name        Session#    Mem Usage
========================= ======== ================ =========== ============
sqlservr.exe                  1232 Services                   0    227,208 K

So this means that the ‘listener’ endpoint is actually being bound to inside of SQL database engine! We can confirm this easily by looking at the SQL Server errorlog:

2012-09-12 03:40:14.820 spid17s      The availability group database “salesdb” is changing roles ….
2012-09-12 03:40:14.830 Server       Server is listening on [ 192.168.1.78 <ipv4> 1433].

2012-09-12 03:40:14.830 Server       Started listening on virtual network name ‘mylisten’. No user action is required.

So this is good information – the listener actually ‘lives’ inside the sqlservr.exe of the primary replica. So any connection from the client will first reach this SQL instance and then potentially be redirected to the secondary read-only replica (if configured, which it is in my case.)

To test this, I ran a NetMon with a filter on the TCP ports for the listener (1433) and the replicas (1450). From this trace, it is obvious that there is an initial connection to the primary replica (which is actually the listener endpoint), and then the client is ‘redirected’ and reconnects to the actual secondary read-only replica:

image

In the above NetMon trace, 192.168.1.78 is the listener IP (which is actually bound to the primary replica’s node) and you will see initially a set of login and TDS packets flowing to and from the workstation to this listener. At the end of the TDS exchange, you will see that the client on SQLWORKSTN has picked up the 192.168.1.101 (which is the secondary replica instance in my test setup) and is talking to the secondary replica.

So to summarize:

  1. The listener IP is physically registered on the bublic network interface of the active node which hosts the AG listener cluster resource
  2. The cluster resource DLL for AlwaysOn (hadrres.dll) communicates with the primary replica SQL instance
  3. The primary replica in turns binds to the listener IP and port
  4. The read-write client which connects to the listener is actually directly connected, without any additional overhead
  5. Read-only clients may be redirected (depending on the configuration, which is indeed the case in my testing above)
  6. This redirection needs support from the client library, which is implemented in .NET 4.5 and SQL Client 2012 respectively (links below)

Some additional reading on this subject:

I hope this gives you a deeper insight how the AG Listener connectivity works. If you enjoyed reading this as much as I enjoyed creating this walkthrough, please take a minute and leave a comment and rate this post!

Thank you!

Arvind.