AlwaysOn Availability Groups Connectivity Cheat Sheet

Today we were discussing the AlwaysOn connectivity options, and the various combinations therein of Readable Secondary setting and the connection strings became very complex. So I have summarized all the possible outcomes in the pivot tables below.

For convenience, we have two tables. The first table covers the cases when the Primary Replica is set to allow all types of connections (i.e. those with and without the ApplicationIntent=ReadOnly attribute.)

Case 1: Primary Replica allows ALL connections

Readable Secondary Setting

 

Additional Connection Parameters

Initial Database

No

Yes

Read-intent

 

Through Listener

None

Non-AG

Primary

Primary

Primary

ReadOnly

Primary

Primary

Primary

None

AG

Primary

Primary

Primary

ReadOnly

Not allowed (5)

Secondary

Secondary

Direct to Primary

None

Not applicable / don’t care

Primary

Primary

Primary

ReadOnly

Primary

Primary

Primary

Direct to Secondary

None

Not allowed (4)

Secondary

Not allowed (1)

ReadOnly

Not allowed (4)

Secondary

Secondary

The second table covers the cases when the Primary Replica is set to allow Read/Write (i.e. those without the ReadOnly attribute.)

Case 2: Primary Replica allows only Read/Write connections

Readable Secondary Setting

 

Additional Connection Parameters

Initial Database

No

Yes

Read-intent

 

 

Through Listener

None

Non-AG

Primary

Primary

Primary

ReadOnly

Not allowed (2)

Not allowed (2)

Not allowed (2)

None

AG

Primary

Primary

Primary

ReadOnly

Not Allowed (3)

Secondary

Secondary

Direct to Primary

None

Not applicable / don’t care

Primary

Primary

Primary

ReadOnly

Not allowed (2)

Not allowed (2)

Not allowed (2)

Direct to Secondary

None

Not allowed (4)

Secondary

Not allowed (1)

ReadOnly

Not allowed (4)

Secondary

Secondary

For some cases, the access to the database is not allowed (either at connection time or at run time, when the database is tried to be accessed) with the following error messages:

1. Error message: The target database (‘salesdb’) is in an availability group and is currently accessible for connections when the application intent is set to read only. For more information about application intent, see SQL Server Books Online.

2. Error message: The target database (‘SalesDb’) is in an availability group and currently does not allow read only connections. For more information about application intent, see SQL Server Books Online.

3. Error message: Unable to access the ‘SalesDB’ database because no online secondary replicas are enabled for read-only access. Check the availability group configuration to verify that at least one secondary replica is configured for read-only access. Wait for an enabled replica to come online, and retry your read-only operation.

4. Error message: The target database, ‘salesdb’, is participating in an availability group and is currently not accessible for queries. Either data movement is suspended or the availability replica is not enabled for read access. To allow read-only access to this and other databases in the availability group, enable read access to one or more secondary availability replicas in the group. For more information, see the ALTER AVAILABILITY GROUP statement in SQL Server Books Online

5. Error message: Unable to access the ‘SalesDB’ database because no online secondary replicas are enabled for read-only access. Check the availability group configuration to verify that at least one secondary replica is configured for read-only access. Wait for an enabled replica to come online, and retry your read-only operation.

This table should help you predict how the connectivity will behave in each of these combinations, and therefore setup your Availability Group replica properties correctly.

More information:

I hope you enjoyed this post as well. If you would like to see more, please leave a comment!

AlwaysOn Availability Groups Listener: connectivity details

Today in some discussions with my colleagues we were looking at the AlwaysOn Listener which allows SQL Server client applications to be abstracted from the details of the AlwaysOn Availability Group implementation. On the surface, The Listener essentially is a cluster resource pair – a Virtual IP and a Virtual Network Name. But on deeper examination, there are some details which are worth knowing. This blog post will help you discover those!

Firstly, here is an example of how the Listener looks like in the Cluster Management. You can see the network name resource (mylisten) and the virtual IP as well (192.168.1.78.)

image

And in Management Studio, you can see the same details as well:

image

In many diagrams the Listener is shown as a separate entity, distinct from the SQL database engine. In some ways this is correct, as you might interpret from the above virtual IP and network name. However, we have to consider the endpoint as well. If you think deeply, the question which might come up in your mind is, fine, this is a network name and a virtual IP; the cluster service helps ‘online’ these on the active node (it actually ‘brings online’ the virtual IP by registering it against the network interface of the active node) – but where is the listener port being bound and listened on? The cluster service cannot do that for us.

So here’s how you can investigate the actual mechanism. Firstly, do a netstat –ano on the primary replica (which is where the listener is active) and find out which process is listening on the listener’s IP and port. In our case these dummy address and port values are 192.168.1.78 and 1433 respectively. On checking, we see:

C:>netstat -ano

Active Connections

  Proto  Local Address          Foreign Address        State           PID

  TCP    192.168.1.78:1433      0.0.0.0:0              LISTENING       1232

The PID 1232 is actually the PID of sqlservr.exe! That means the listener functions are actually also implemented in the database engine. FYI, here is how we can check which process is PID 1232:

C:>tasklist /fi “pid eq 1232”

Image Name                     PID Session Name        Session#    Mem Usage
========================= ======== ================ =========== ============
sqlservr.exe                  1232 Services                   0    227,208 K

So this means that the ‘listener’ endpoint is actually being bound to inside of SQL database engine! We can confirm this easily by looking at the SQL Server errorlog:

2012-09-12 03:40:14.820 spid17s      The availability group database “salesdb” is changing roles ….
2012-09-12 03:40:14.830 Server       Server is listening on [ 192.168.1.78 <ipv4> 1433].

2012-09-12 03:40:14.830 Server       Started listening on virtual network name ‘mylisten’. No user action is required.

So this is good information – the listener actually ‘lives’ inside the sqlservr.exe of the primary replica. So any connection from the client will first reach this SQL instance and then potentially be redirected to the secondary read-only replica (if configured, which it is in my case.)

To test this, I ran a NetMon with a filter on the TCP ports for the listener (1433) and the replicas (1450). From this trace, it is obvious that there is an initial connection to the primary replica (which is actually the listener endpoint), and then the client is ‘redirected’ and reconnects to the actual secondary read-only replica:

image

In the above NetMon trace, 192.168.1.78 is the listener IP (which is actually bound to the primary replica’s node) and you will see initially a set of login and TDS packets flowing to and from the workstation to this listener. At the end of the TDS exchange, you will see that the client on SQLWORKSTN has picked up the 192.168.1.101 (which is the secondary replica instance in my test setup) and is talking to the secondary replica.

So to summarize:

  1. The listener IP is physically registered on the bublic network interface of the active node which hosts the AG listener cluster resource
  2. The cluster resource DLL for AlwaysOn (hadrres.dll) communicates with the primary replica SQL instance
  3. The primary replica in turns binds to the listener IP and port
  4. The read-write client which connects to the listener is actually directly connected, without any additional overhead
  5. Read-only clients may be redirected (depending on the configuration, which is indeed the case in my testing above)
  6. This redirection needs support from the client library, which is implemented in .NET 4.5 and SQL Client 2012 respectively (links below)

Some additional reading on this subject:

I hope this gives you a deeper insight how the AG Listener connectivity works. If you enjoyed reading this as much as I enjoyed creating this walkthrough, please take a minute and leave a comment and rate this post!

Thank you!

Arvind.

How to assign a static port to a SQL Server named instance – and avoid a common pitfall

While Books Online clearly mentions the steps to Configure a Server to Listen on a Specific TCP Port we still see people missing out on one small but important detail in these steps: they forget to delete the entry (0 or some random port) for dynamic port. This firstly results in confusion and occasionally can result in connectivity problems as well. Let me explain how using an example from one of our lab setups.

As a first step, let’s see what the ‘administrator’ (in this case, yours truly Smile) had done:

image

As you can see, they have added the static port for ‘IPAll’ with a value of 1450. That part is fine. The problem though is they forgot to remove the entries for the dynamic ports (0 or some random port). That means that when they restarted SQL, the dynamic port setting is still valid. In fact if we query sys.tcp_endpoints, you will still see the engine thinks it is listening on dynamic port:

SELECT        name, protocol_desc, type_desc, state_desc, is_admin_endpoint, port, is_dynamic_port, ip_address 
FROM            sys.tcp_endpoints

gives us:

name protocol_desc type_desc state_desc is_admin_endpoint port is_dynamic_port
Dedicated Admin Connection TCP TSQL STARTED 1 0 1
TSQL Default TCP TCP TSQL STARTED 0 0 1

The important observation is that the engine reports that it is still using a dynamic port. It does not report the static port number 1450 which we selected in Configuration Manager. Let’s double-check in the errorlog to see if indeed the static port is being picked up at all. And lo and behold:

Server is listening on [ 'any' <ipv6> 1450]. 
Server is listening on [ 'any' <ipv4> 1450]. 
Server is listening on [ 'any' <ipv6> 49626]. 
Server is listening on [ 'any' <ipv4> 49626].

In our case, sqlservr.exe has a PID of 1240. Using the command netstat –ano, we can see what it is listening on.

 Proto  Local Address          Foreign Address        State           PID 
  TCP    0.0.0.0:1450           0.0.0.0:0              LISTENING       1240 
  TCP    0.0.0.0:49626          0.0.0.0:0              LISTENING       1240 
  TCP    127.0.0.1:49627        0.0.0.0:0              LISTENING       1240 
  TCP    192.168.1.101:1450     192.168.1.200:49386    ESTABLISHED     1240 
  TCP    192.168.1.101:1450     192.168.1.200:49396    ESTABLISHED     1240 
  TCP    [::]:1450              [::]:0                 LISTENING       1240 
  TCP    [::]:49626             [::]:0                 LISTENING       1240 
  TCP    [::1]:49627            [::]:0                 LISTENING       1240

So it is not only listening on the static port, but also on the dynamic port 49626. The DAC is listening on TCP port 49627. The values with a local address of [::] are the IPv6 ‘All’ address.

So depending on what got cached earlier in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSSQLServer\Client\SNI11.0\LastConnect on the client (you can find some detail about the LastConnect registry key and the caching mechanism in this KB article), the client might attempt to connect to the previous dynamic port (which is still valid based on our observation above.)

FYI, if we run NetMon, we can see that the SSRP response from the SQL Browser correctly gives back 1450 as the port for this named instance:

image

For clarity I’ve reproduced the response from SQL Browser (using the SSRP protocol) back to my client (SQLCMD.exe):

.^.ServerName;SOMESERVER;InstanceName;SOMEINSTANCE;IsClustered;No;Version;11.0.2100.60;tcp;1450;;

From the above it is clear that SQL Browser is correctly sending the static port assignment. But if you are like me, I feel uneasy till I fix the root cause, which is to delete the dynamic port assignment!

To summarize here is what we saw in this walkthrough:

  1. The official steps (captured in Books Online) to assign a static port for a named instance involve also deleting the value (0 or some random port) for the dynamic port.
  2. Failure to delete the dynamic port value in SQL Configuration Manager will cause SQL to listen on both the static as well as the dynamic ports.
  3. This means that clients will succeed to connect to the erstwhile dynamic port if they had that cached in the LastConnect client side registry key.
  4. For clients which do not have cached connection details, SQL Browser seems to pickup the static port and sends that back to the client.
  5. So follow the steps in the BOL article to the T and delete the dynamic port value right after you type in the static port value, and in any case before you restart the instance.

FYI the steps to fix a static port for the Dedicated Admin Connection (DAC) are in the KB article How to configure SQL Server to listen on a specific port under the section ‘Configuring an instance of SQL Server to use a static port’.

I hope you enjoyed reading this post as much as I enjoyed bringing it to you! Please take a minute to rate this post and leave some comments!

Windows Performance Toolkit: Finding CPU consumer

A colleague of mine recently asked the question ‘I see CPU N is constantly pegged at 100%; is there any good way to determine which specific process is using that specific CPU?’

Well, he already figured that Task Manager or Process Explorer does not provide that information. And by definition purely in user mode one cannot find this information, because the OS dispatcher schedules threads, not processes and the dispatcher runs at a lower level than user mode threads do. So classically, a live kernel debug would be required to be sure. The extension command !pcr or !prcb will give you the current thread for that processor, which you can use with !thread to get the process object.)

BUT no customer will let us do a live kernel debug in production unless it is really necessary. There must be an easier way out!

Today, with XPerf (part of Windows Performance Toolkit) you can get this information fairly easily and accurately. Here’s how you can get started:

  1. Download and install the Platform SDK (http://msdn.microsoft.com/en-us/windows/hardware/gg463009.aspx) and choose the install the Windows Performance Toolkit
  2. Redistributable versions should get installed as well at C:Program Files (x86)Windows Kits8.0Windows Performance ToolkitRedistributables (notice the ARM target is now available as well!)
  3. Use the Windows Performance Recorder utility to record a trace with just CPU in it. Save it to a (.ETL) file.
  4. Collect data for not more than 30-45 seconds. A LOT of information is collected, so limit the duration.
  5. Transfer and Double-click the .ETL file on your laptop, it should open up in the Windows Performance Analyzer (WPA) utility.
  6. Then from the graph explorer on the left side you should be able to select the CPU timeline report
  7. Select the CPU of interest, and ZOOM IN to a particular section of the timeline. The zooming is important as without that you are not told which process is utilizing that CPU.

Here is an example screenshot of the type of information that WPA displays. Click on the thumbnail to get a larger view.

image

As you can see, it is really detailed and extremely easy to use. If you have Windows 7, Windows Server 2008 R2, Windows 8 or Windows Server 2012, you should be able to leverage the WPT to the full extent.

For more information on WPT you can click the help links I have mentioned above, and for a more interactive discussion I recommend you view the BUILD 2011 videos around WPT / WPA:

There are some additional videos on Channel9 which cover specific scenarios where WPT / WPA can be very useful:

Please leave a comment and rate the post if you found it useful. I appreciate your time very much!

Happy performance investigations!

Error message: The target database (‘xyz’) is in an availability group and currently does not allow read only connections

Today while poking around the AlwaysOn Availability Groups feature I came across the above error when testing the read-only access routing configuration. That configuration allows connections with declared ReadOnly intent to be routed to one of the read-only secondary replicas.

The configuration looked like this:

  • 3-node Windows 2008 R2 Core cluster
  • 3 instances of SQL 2012 each on one node
  • 2 databases in the Availability Group
  • Each of the replicas was set to allow only ReadWrite connections when in primary role
  • Each of the replicas was set to ReadOnly intent connections when in secondary role
  • Read-only URL access for each replica was set correctly
  • Read-only routing list was also set correctly

I was testing connectivity from Management Studio to open a new Database Engine Query window (not the Object Explorer) and was using the following options:

  • Connect to database: <default>
  • Additional Connection Properties tab has ‘ApplicationIntent=ReadOnly’

However when I tried to USE any of the databases in the AG, it would disallow me. When I checked @@SERVERNAME it actually reported the instance name of the PRIMARY replica! So clearly the routing was not working.

Then it dawned on me that the default database for this login being master, it was causing an issue with the routing. On changing the ‘Connect to database’ to XYZ (the name of the AG database) it was routing me correctly to the read-only secondary.

Of course, this is only ONE reason for the above error. Other possibilities include the fact that your AG configuration is incorrect and does not allow read-only secondary in any case.

Hope this helps you!

“Operating system error 1398” while backing up database to network share

I was facing this on a VM setup, where the service account of my SQL Server service is a domain account. The DC is also a VM. Here were the symptoms:

While backing up the database to the UNC share, I was getting an error:

"Operating system error 1398" (There is a time and/or date difference between the client and server)

First things first, I did not have any timesync issue. That would have affected Kerberos and everything else around it.

Secondly, doing a xp_cmdshell ‘dir \uncserverpath’ would result in the same error, so this was not unique to the backup code (obvious, but better to check.)

Interestingly, if I opened up a new CMD prompt ‘running as’ the SQL Service account, and execute the ‘dir’ command, then the network share contents were listed (i.e. no error message.)

So then it dawned to me that at the time the SQL Service was starting up, there was a race condition wherein the DC was not yet ‘fully’ started. This caused the usage of stale Kerberos tickets and in turn caused the above error.

Only way to fix this was to restart my SQL instance, because then the expired Kerberos ticket would be replaced with a freshly authenticated and timestamped one. Sad, but true.

Hope this helps someone out there!

“Failed to join domain” trying to add Windows 2008 R2 Core to domain

Today I was using SCONFIG (if you don’t know what SCONFIG is, check out these posts to add a Windows 2008 R2 core server to an existing domain. I had set the IP address and ensured that the domain name and credentials are correct, but I kept getting ‘failed to join domain’.

I tried pre-creating the computer objects in the AD but to no avail. Finally after some troubleshooting it turned out to be a most basic issue: the primary DNS IP was not set correctly. After setting that (again with SCONFIG) I was able to add the Windows core box into the domain.

‘No Disks suitable for cluster disks’: one more possible cause

Today in the lab I was working on a test 3-node cluster. I was using Hyper-V for this. We normally use the iSCSI Software Target for testing, so I had set that up on a central VM and then published the target to all the VMs.

The disks were seen in the disk management of Node1, and I brought them online, partitioned and formatted them. However when I used Cluster Management to add them to the Storage group, it complained:

‘No Disks suitable for cluster disks were found, For diagnostic information about disks available run the Validate config wizard’

I did run the cluster validation (storage only) and got no further inputs. Shortly after some troubleshooting I realized I had forgotten to configure the volumes at the iSCSI initiator on each node. The easiest thing to do for this is to click the ‘Auto-configure’ button:

image

Due disclaimer: This is just ONE, but not the ONLY reason for the above error message! This is somewhat specific to the iSCSI initiator, so only check it if you are using iSCSI volumes in your cluster and if you get the above message.

Opinion Poll: Are PRINT statements considered harmful?

Today during a discussion a point came up around the role of PRINT statements in production code. While most data access today is routed through a data access layer (typically .NET or JDBC) and is focussed on consuming result sets (or executing U/I/D nonquery stuff) we were wondering on what you use PRINT statements for. In the long past, I would have said that the print statement is probably the best way for debugging, but in today’s world with easy access to the T-SQL debugger, a developer is probably much better off without PRINT.

With SqlClient, we consume the PRINT events using the InfoMessage event, but in practice I wonder how many of you actually rely on that. Please comment on this post to share your experiences with PRINT and if you think it is good / bad / evil in today’s world 🙂

Implementing MDX Drillthrough in SSRS

Let’s say for some reason your users want to implement MDX DRILLTHROUGH statement (equivalent to the cube action provided by most OLAP browsers) in SSRS. Here’s a step-by-step way to implement this.

For the purposes of this walkthrough, we are using the standard AdventureWorks Analysis Services database. You can obtain this and the related DW database from http://msftdbprodsamples.codeplex.com/. Do note that you have to manually deploy and process the database after opening the Adventure Works.sln file from its default location of C:Program FilesMicrosoft SQL Server100ToolsSamplesAdventureWorks Analysis Services Projectenterprise

Our objective is to implement two reports:

1. The parent report will list the top 10 products along with their sales amount totals

2. The child (drillthrough) report will display a raw DRILLTHROUGH for the first 500 rows and get back the product quantity of the individual order, along with the order number and date the order was placed

Step 1. Verify the AS database is deployed and processed

Step 2. Here is our (pretty basic) MDX query for the Top 10 listing for current products:

WITH MEMBER PKey AS
[Product].[Product].CurrentMember.Properties(“Key0”)
select {[Measures].[Reseller Sales Amount], PKey } on 0,
(TopCount([Product].[Product Categories].[Product], 10, [Measures].[Reseller Sales Amount])) on 1
from [Adventure Works]
where [Product].[Status].[Current]

Why the calculated member PKey? You will see later.

Step 3. Capture the DRILLTHROUGH query in Profiler

Since writing the DRILLTHROUGH statement by hand can be a bit of a challenge, here it is captured from SQL Profiler:

DRILLTHROUGH  Select  ([Measures].[Reseller Sales Amount],[Product].[Product].&[358])  on 0 From [Adventure Works] RETURN [Reseller Sales].[Reseller Order Quantity],[$Employee].[Employee],[$Delivery Date].[Date],[$Sales Territory].[Sales Territory Region],[$Reseller Sales Order Details].[Sales Order Number]

You can see the product key is highlighted. This is actually the key column syntax as can be seen from the ampersand prefix. So now you know why we selected the PKey calculated member in the previous query.

Step 4. Create the parent report

The first step is to create a shared data source to the SSAS database:

image

Next, we actually create the report. We select the New Report option and use the above shared data source. In the Query Designer, we switch to Query Mode and type in the MDX query we developed earlier:

image

Once this is done, select Tabular report, and move all the columns into the Details section of the report. Title the report as MainReport.

image

In the generated report, delete the PKey column.

Step 5. Create the drillthrough report

This is the tricky part. Normally, the MDX query designer does not support DRILLTHROUGH SELECT syntax. Hence we need to use a workaround. Please note that this workaround may not be officially supported by Microsoft.

We start the same way, by creating a new report and referencing the shared data source. But when you launch the query designer please switch the query type to DMX instead of MDX.

image

After this is switched, also switch to query mode as done before. In the space at the bottom, type in the ‘DMX’ query which is actually our MDX DRILLTHROUGH SELECT query. However now we do need to parameterize the query to accommodate the PKey parameter:

DRILLTHROUGH MAXROWS 50 Select  ([Measures].[Reseller Sales Amount], StrToMember(‘[Product].[Product].&[‘ + @PKey + ‘]’))  on 0 From [Adventure Works] RETURN [Reseller Sales].[Reseller Order Quantity],[$Employee].[Employee],[$Delivery Date].[Date],[$Sales Territory].[Sales Territory Region],[$Reseller Sales Order Details].[Sales Order Number]

Before you can click on OK though, you need to setup the parameter as well. You do that by clicking on the Query Parameters button:

image

We provide a dummy value of 394 so that the fields can be retrieved. The actual value, of course, will be provided by the drillthrough action later.

image

Select the tabular layout and move all the fields to the Details section. Name the report DrillThroughReport.rdl. Finally, set the report parameter PKey type as Hidden:

image

Step 6. Create the drillthrough action

Back to MainReport.rdl, right click on the Product text box and select Text Box properties. In the Action tab, you need to set up the Drillthrough (SSRS this time… don’t get confused by the similar term Smile)

image

We also change the hyperlink look and feel for good measure:

image

And that does it for our reports:

image image

Final Notes

How ‘legitimate’ is it to masquerade a MDX DRILLTHROUGH as a DMX query? As I said before, this is not officially supported. You can refer the to the MS Connect posting at http://connect.microsoft.com/SQLServer/feedback/details/126175/reporting-services-drillthrough-mdx-queries for the ‘won’t fix’ response.

The main reason for my documenting the approach above is that in the specific case I was interested in, the MAXROWS clause is very important. Of course one can argue that using MDX the same can be achieved, but then it is a matter of preference in the end.

BTW, Chris Webb also shares his perspectives on this at http://cwebbbi.wordpress.com/2009/06/16/implementing-analysis-services-drillthrough-in-reporting-services/

If you liked this posting, please rate it! In any case please do take a minute and leave comments, questions etc.