Windows, SQL Server and Multi-Core / Hyper-Threading

Updated 14 March 2019: the information in this blog post is NO LONGER RELEVANT. Official licensing information should be obtained through your regular Microsoft sales / licensing specialists.

Very often we get asked a question about ‘how many processors does my SQL Server 200x Edition really support’. This post hopes to summarize the considerations and actual limits.

SQL Licensing is per socket

First, let’s take a look at the Books Online topic ‘Maximum Number of Processors Supported by the Editions of SQL Server‘. It says: SQL Server is licensed per processor socket, and not per logical CPU basis.

So what does that really mean? Again from the above Books Online topic:

For example, the following is considered a single processor for purposes of this table:

  • A single-core, hyper-threaded processor with 2 logical CPUs per socket.
  • A dual-core processor with 2 logical CPUs.
  • A quad-core processor with 4 logical CPUs.

Windows Licensing is also per socket

Just like SQL Server, Windows Server licensing has been based around sockets. This document summarizes it nicely:

For Microsoft software with processor limits, each processor counts as a single processor regardless of the number of cores and/or threads that the processor contains. For example, Windows Server 2003 R2 Standard Edition can be used on a four-processor system, whether the processors in the system are single-core, hyperthreaded, or multicore.

So, what’s in my box?

A natural question for customers is, how to determine what is in my server hardware? Is it socket, or cores, or logical CPUs from HyperThreading?

Option 1: Use Intel’s utility

From our KB article on SQL and Hyperthreading, we recommend using Intel’s CPUCount utility to distinguish this for IA32 or Intel 64 class CPUs. The utility distinguishes between CPUs (sockets), cores, and logical CPUs (HyperThreading). The current article describing it is here and the sample code is here (both are from the Intel website.) If you build the samples using Visual Studio you can see the output below:

httpsmsdnshared.blob.core.windows.netmediaTNBlogsFSBlogFileStorageblogs_msdnarvindshWindowsLiveWriterWindowsSQLServerandMultiCoreHyperThreadi_A41Bimage_thumb_1.png

Option 2: Use SysInternals CoreInfo

Mark Russinovich has released CoreInfo which can also dump similar information. The only concern I have is that it uses the term ‘physical processor’ a little too freely, which can confuse someone. Here is sample output for a 1-socket, 2-core, no HT CPU:

httpsmsdnshared.blob.core.windows.netmediaTNBlogsFSBlogFileStorageblogs_msdnarvindshWindowsLiveWriterWindowsSQLServerandMultiCoreHyperThreadi_A41Bimage_thumb_2.png

Option 3: Windows Vista / Server 2008

MSInfo32 (a.k.a. System Information program) can be used to distinguish between sockets, cores and logical processors.

httpsmsdnshared.blob.core.windows.netmediaTNBlogsFSBlogFileStorageblogs_msdnarvindshWindowsLiveWriterWindowsSQLServerandMultiCoreHyperThreadi_A41Bimage_thumb_3.png

Option 4: Third Party Tools

A variety of 3rd party tools can report detailed CPU information, such as number of cores, logical processors and more.

Product Limitations

The Windows 2003 CPU (physical socket) limitations can be drawn from this page and an updated one for Windows 2008 at this page:

Windows Edition Number of physical sockets supported
Windows Server 2003 / 2008, Standard Edition 4
Windows Server 2003 / 2008, Enterprise Edition 8
Windows Server 2003 / 2008, Datacenter Edition 32 (32-bit) or 64 (64-bit)

SQL Server limitations from Books Online:

SQL Edition Number of physical sockets supported
SQL Server 2005 / 2008, Express Edition 1
SQL Server 2005 / 2008, Standard Edition 4
SQL Server 2005 / 2008, Enterprise Edition OS Maximum (as per the table above)
SQL Server 2000, Desktop Engine (MSDE) 2 on currently supported OS editions *
SQL Server 2000, Standard Edition 4 on currently supported OS
SQL Server 2000, Enterprise Edition Check link below *

* There are specific notes for older unsupported OS platforms such as Windows 98 and NT 4.0 which are described in this SQL 2000 Books Online entry.

Also note that SQL 2000 only introduced HyperThreading awareness in SP3 so that the number of logical processors are handled correctly when it comes to enforcing the licensing limits.

There is a related blog post from our CSS SQL brethren which I would recommend all to read.

T-SQL Anti-pattern of the day: ‘all-in-one’ queries

Scenario

A common requirement for enquiry queries on an OLTP database is to have search criteria which are very specific (‘get me details for for OrderID = NNNN’) and also the occasional reports which ask for all the orders (‘get me all the orders, no questions asked’.) Here is a sample from AdventureWorks which illustrates the problem:

CREATE PROCEDURE RptOrder(@OrderID int)
AS
BEGIN
    SELECT *
    FROM Sales.SalesOrderHeader
    WHERE (SalesOrderID = @OrderID OR @OrderID IS NULL)
END

What is the meaning of the underlined predicate in the above WHERE clause? It is actually a ‘special case’ where the developer intends to get back all the rows, regardless of the OrderID. This ‘special case’ is triggered by passing in a value of NULL for the @OrderID parameter.

Problem

So while this construct looks good in theory, it lends itself to very poor performance. Take a look at the 2 cases where this procedure is executed.

Case A: with specific OrderID

EXEC RptOrder 43672

Case B: asking for all records

EXEC RptOrder NULL

The plan, it turns out, is the same for both cases and a scan is used! This is despite a seekable index being present on SalesOrderID column for the SalesOrderHeader table:

image

The reason the optimizer chooses to scan the SalesOrderHeader (in this case it chooses a non-clustered index scan) is because it has no way to determine at compile and optimization time, as to what the specific value of @OrderID would be. Hence it has no way to ‘fold’ the (@OrderID IS NULL) expression and therefore has no option but to look at all the records.

Workarounds

‘IF-ELSE’ Workaround: The straightforward workaround in simple cases like the one above is to separate out the 2 cases into an IF-ELSE block:

ALTER PROCEDURE RptOrder(@OrderID int)
AS
BEGIN
    IF (@OrderID IS NOT NULL)
    BEGIN
        SELECT *
        FROM Sales.SalesOrderHeader
        WHERE (SalesOrderID = @OrderID)
    END
    ELSE
    BEGIN
        SELECT *
        FROM Sales.SalesOrderHeader
    END
END

Now, the 2 test cases work as expected. Here are the execution plans:

EXEC RptOrder 43672

image

EXEC RptOrder NULL

image

Dynamic SQL Workaround: However, as the number of predicates in the WHERE clause increase, and if all those predicates (or most of them) have such ‘catch-all’ handling then the IF – ELSE construct becomes unviable. In those cases, a dynamic SQL construct should be considered. Of course, when dealing with dynamic SQL, we must consider security first, including the possibility of SQL Injection and also the Execution Context of the dynamic SQL statement. But that is a topic for another post. Right now, here is how we could handle something like that:

— NOTE: This code is highly simplified and does not provide for any screening

— or protection against SQL injection!!! Provided as-is, confers no warranties.

ALTER PROCEDURE RptOrder(@OrderID int)
AS
BEGIN
    DECLARE @sDynamicSQL nvarchar(4000)
    SELECT @sDynamicSQL = ‘SELECT * FROM Sales.SalesOrderHeader ‘

    IF (@OrderID IS NOT NULL)
    BEGIN
        SELECT @sDynamicSQL = @sDynamicSQL + ‘ WHERE (SalesOrderID = @OrderID)’
    END

    EXEC sp_executesql @sDynamicSQL, N’@OrderID int’, @OrderID = @OrderID
END

Different Code Paths: The cleanest way of course is to consider having separate procedures for each kind of query. For example we can have a procedure called RptSpecificOrder for the case where we are searching by specific OrderID, and another one called RptAllOrders for the ‘get-me-everything’ case. This does have the advantage of clean isolation, but it does not scale easily when the number of predicates are larger. But is does also have the advantage that if we are querying for specific orders 99% of the time, that code path is simplified and optimized accordingly.

Conclusion

Beware of this T-SQL anti-pattern as it is one of the most common ones we see and it does have a huge (negative) impact on query performance. As you can see, if they are not done with these patterns in mind, application design and reporting requirements can have a detrimental effect on OLTP query execution. Separating reporting and OLTP workloads could be the key to solving these kinds of issues. But if separation is not possible, then clever use of separate code paths and stored procedures could help ensure that the most efficient execution plan is selected for each case. For complex queries, dynamic SQL may offer the simplest way out, but due care has to be taken to ensure that permissions and SQL injection issues are kept in mind when dealing with dynamic SQL statements.

T-SQL Anti-pattern of the day: comparing DATETIME field with date-only literal

Scenario

It is a pretty common situation to have transaction date-time stored in a DATETIME field. The problems start with the fact most applications used GETDATE() or some such equivalent at the client side to record the order date-time stamp. So a typical entry for an OrderDate would actually end up with a time component as well. In versions prior to SQL Server 2008, there was only this option – no specific options like the DATE data type.

Problem

While INSERT code worked pretty adequately in this case, the problem starts with running reports of some kind. Most reports take parameters with the date (or date range) for which the reports are required. The (anti-)pattern around the WHERE clause predicates in a typical report query (based on the AdventureWorks database) would look like this:

SELECT *
FROM Sales.SalesOrderHeader
WHERE CONVERT (varchar(10), OrderDate, 112) = ‘20040731’

OR

SELECT *
FROM Sales.SalesOrderHeader
WHERE OrderDate LIKE ’31 Jul 2004%’

Both these predicates are non-optimal for usage of any index (and hopefully it would be a clustered index, to support range scans efficiently) on OrderDate:

  • The first query fails to use an index (and hence scans) because it explicitly converts the OrderDate field to a ISO-format date (varchar) in order to perform the comparison:

image

  • The second query has an implicit conversion to a varchar data type and hence again fails to use any indexes on OrderDate:

image

Workaround

From a design perspective there are potentially 2 problems rolled into 1 in this scenario:

  • The fact that we are storing date and time values in the same datetime field, which is required for an OLTP system
  • The fact that the report is running on the same database as the OLTP system

In many cases the 2nd aspect (reporting and OLTP on the same database) is unavoidable due to hardware constraints. In such cases, the possible workarounds could be:

Option 1: Use explicit timestamps in the literal portion of the predicate in the WHERE clause

This one is perhaps the most practical workaround for most workloads. Here the query is re-written as follows:

SELECT *
FROM Sales.SalesOrderHeader
WHERE OrderDate BETWEEN ’31 Jul 2004 00:00:00′ AND ’31 Jul 2004 23:59:59:997′

The ensuing query plan looks more efficient. Of course the cost of the key lookup is due to the ‘SELECT *’ usage, and that is something which can be overcome by either:

  • Specifying only the relevant fields in the output list OR
  • Considering a clustered index on OrderDate (this probably makes sense in a reporting-only database)

Another thing to note is the usage of 997 milliseconds in the above query. Why not 999 milliseconds? Because then SQL Server would round it up to ‘1 Aug 2004 00:00:00’. Check this linked KB article for more details.

image

Option 2: Store only the date portion

If the time portion of the order timestamp is not important, have the application INSERT the CONVERTed varchar (or better still, perform that conversion in the application) and store only the date time portion. If you are in a position to upgrade to SQL Server 2008 then the DATE datatype may be of interest here.

Option 3: Have separate fields for datetime and date portion

Modify the database schema to have separate fields for the OrderDateTime and OrderDate. Modify the INSERT code to insert the raw date-timestamp into OrderDateTime, and the a CONVERTed date portion into the OrderDate field. Use the OrderDate field for reports and OrderDateTime for other queries which need it.

Conclusion

Separating reporting from OLTP processing is the key design aspect to always be kept in mind. In situations where that is not possible, some refactoring of the query can help. Table denormalization, coupled with minor changes in the application can also help in some situations to optimize query execution.

There may be other innovative solutions to the problem, and I am eager to hear from any readers about those comments!

T-SQL Anti-pattern of the day: UPPER() and LOWER()

This one is even more common and therefore has a high impact: the usage of the above functions in WHERE clause predicates. I’ve examined each in some detail below.

Here’s a typical example I see, based off the AdventureWorks database.

Scenario

To illustrate the effect of this example, I have added an extra index on the FirstName column in Person.Contact table:

CREATE NONCLUSTERED INDEX NC_Person_Contact_FirstName
ON Person.Contact(FirstName)

Here’s a typical query pattern coming in from the client:

DECLARE @FirstName nvarchar(50)
SELECT @FirstName = ‘pilar’

SELECT *
FROM Person.Contact
WHERE UPPER(FirstName) = UPPER(@FirstName)

Problem

Let’s take a look at the execution plan for this batch:

image

As expected there is a scan happening, due to the presence of the UPPER() on the FirstName column. Interestingly the optimizer chose to scan the non-clustered index rather than the clustered one. It did that because the cost of scanning the narrower index on FirstName is cheaper than the clustered index. Anyway, it is still far from optimal.

Workarounds

  • If your server / database / column is set to use a case insensitive collation (if you don’t know what a collation is, I suggest you start with this Books Online entry first) then you really do not need any of the UPPER() or LOWER() function calls, because in that case, the execution engine will perform a case-insensitive comparison. Most customers I work with, use the US English installation and the default collation of SQL_Latin1_General_CP1_CI_AS. For those customers, this workaround would be safe.

Is it a good one by design? The answer as with many things in SQL Server, is it depends. If your installation procedures are standardized and repeatable, and if you are using the default collation (or any other case-insensitive one) then this workaround may very well turn into a by-design solution. In other cases, this workaround may not solve the problem.

  • A more robust solution would be to transform (in this case, to uppercase) the values being stored in these tables (in our case, FirstName) at INSERT time. That way, the predicate in the SELECT query can change to the following:

FirstName = UPPER(@FirstName)

Conclusion

Careful thought needs to go into design and development of databases where strings are being compared. Using a case-insensitive collation is sometimes the best way to avoid some of the problems described above. However if you have a case-sensitive collation, performing these operations up front in client code at INSERT time is perhaps the best way to negate the performance side effects of using the UPPER() / LOWER() predicates in WHERE clauses.

T-SQL Anti-pattern of the day: Prefix codes stored in (n)varchar fields

In the course of my work as a SQL Server Premier Field Engineer, we tend to see a lot of patterns in code. Some of these patterns do not lend themselves to optimal performance. This post is the first in a series of such ‘how not to code’ tips which would hopefully help some T-SQL developers out there.

Scenario

OK. So imagine a situation where there’s a table like this:

CREATE TABLE Orders
(
    OrderNumber nvarchar(50) PRIMARY KEY NOT NULL,
    ShipAddress nvarchar(255),
    IsDelivered    bit
)

Now in itself the table design looks innocuous except that the OrderNumber is encoded as per a scheme which looks like this:

  • First 3 characters: BranchId which generated this order
  • Next 12 characters: Date and time encoded as YYYYMMDDHHmm
  • Next 35 characters would accommodate a running number reset each day

Because of the fact that the PRIMARY KEY constraint is by default implemented as a clustered index, we get the following index on OrderNumber:

PK__Orders__0AD2A005    clustered, unique, primary key located on PRIMARY    OrderNumber

Problem

SELECT queries often run on this table which list the orders by branch. Those typically look like this:

SELECT OrderNumber, ShipAddress
FROM Orders
WHERE SUBSTRING(OrderNumber, 1, 3) = N’004′

Not surprisingly we get a table scan, though there is an index on OrderNumber. Note the parallelism gather streams operator also. That’s because the cost of the overall query plan exceeded the default ‘cost threshold for parallelism’ (5).

image

This is a very common pattern I observe, especially on transactional tables which have a large number of enquiries (SELECTs) on them. The resultant logical reads due to the index scans cause poor response times. In many cases the cost of the scan exceeds the cost threshold for parallelism and further compounds the problem.

Root Cause

So, what is the problem here? Evidently it is a problem of database design. Storing such encoded values in a field is a violation of the First Normal Form in database design (http://support.microsoft.com/kb/283878). So the right solution would involve taking this encoded BranchId and placing it in another field.

Workaround

However, it is not easy to implement this overnight in most systems which are in production. For the above specific query, there is a way to improve performance, though. The SUBSTRING is used to strip the first 3 characters and then compare with an input parameter. This can be replaced with an equivalent query such as:

SELECT OrderNumber, ShipAddress
FROM Orders
WHERE OrderNumber LIKE N’004%’

The resultant execution plan looks like this, with a dramatically reduced cost now that the index is being used to do a seek:

image

Of course there are other possibilities involving indexed views / computed columns but those I will examine in another post.

Conclusion

Database design conforming to the normal forms is generally a good idea to drive usage of indexes and have acceptable response times. In some cases where queries use a predicate comparing the prefix of values stored in a field, it may be worthwhile examining the use of LIKE as compared to a SUBSTRING.

SQL Worker Thread Stack Sizes

Recently during some internal discussions we had a lot of talk around the stack sizes for various SQL builds. Here is the summary of that talk:

 

SQL Server Build OS Build Stack Size
SQL Server x86 (32-bit) x86 (32-bit) 512 KB
SQL Server x86 (32-bit) x64 (64-bit) 768 KB (see notes 1 and 2)
SQL Server x64 (64-bit) x64 (64-bit) 2048 KB
SQL Server IA64 (64-bit Itanium) IA64 (64-bit Itanium) 4096 KB (see note 3)

 

Notes:

  1. This (wow64 stack size) is confirmed from http://www.microsoft.com/technet/prodtechnol/sql/2005/diagandcorrecterrs.mspx: SQL Server 2000 running on X64 in WOW mode also uses more memory per worker (768 KB = 512 KB for standard stack + 256 KB used by WOW to support the stack).
  2. Also check http://support.microsoft.com/kb/916950 for an issue related to SQL 2000 x86 (really this should be the only case where this configuration might be used) running in WOW64 mode on x64 OS. http://msdn.microsoft.com/en-us/library/aa384219(VS.85).aspx mentions the overhead which is required for a 32-bit process running inside WOW64.
  3. Slava Oaks blogged specifically about this http://blogs.msdn.com/slavao/archive/2005/03/19/399117.aspx

I hope this summary helps someone out there!

Using PowerShell to cleanup older versions of files

I only recently started using PowerShell and was amazed with the flexibility and ease of use. Here’s a sample script I put together after a few hours of playing around. I have also described the scenario around the requirement for this script.

Task at hand: A folder contains many files such as:

  • File1v1
  • File2v1
  • File2v2
  • ABCv3
  • ABCv4
  • ABCv5
  • MyFile9
  • MyFile15
  • MyFile21

The requirement was to keep only the latest version of each file (and therefore identify the older files to move). In my case, I have over 400 such files at any given time (and the number will grow.) The only way to distinguish (in my specific case) which file was the ‘latest’ was to use the numeric suffix at the end of each file (timestamps etc. are not reliable for my requirement.)

So to automate this task, I figured I’d try to learn PowerShell and put it to use. Here is the script I used:

################# IdentifyOlderVersions.ps1 ######################

$files = dir X:myFolder*.* | Sort-Object

$prevfile = “”
$filenamewithoutsuffix = “”
$previousversions = @()

foreach ($file in $files)
{
    $filenamewithoutsuffix = ($file.BaseName -split “d+$”)[0]
    if ($filenamewithoutsuffix -eq $prevfile)
    {
        # this is the same file ‘lineage’, add the current file to the array
        $previousversions = $previousversions + $file
    }
    else
    {
        # new lineage, need to move files and then reset $previousversions
        $listoffilestobemoved = “”
        $tmpindex = 0
        foreach ($filetobemoved in $previousversions)
        {

            # The check below is to ensure that we only list the files to be moved

            if ($tmpindex -eq ($previousversions.length – 1))
            {
                break
            }           

            $listoffilestobemoved = $listoffilestobemoved + “,” + $filetobemoved
            $tmpindex++
        }
        $previousversions = @()         
        $previousversions = $previousversions + $file
        if ($listoffilestobemoved -ne “”)
        {
            Write-Host $listoffilestobemoved
        }       
    }

    # in both cases, assign the previous filename = current one
    $prevfile = $filenamewithoutsuffix
}

################# IdentifyOlderVersions.ps1 ######################

This worked as a charm, and as you can see, some pretty involved scripts can be put together. We can also create .NET classes inside a script, and use those to achieve more than what is normally possible.

 

I encourage you to try using PowerShell for all such tedious tasks. There are some great hints and information (including link to a free course book) at http://www.microsoft.com/technet/scriptcenter/topics/winpsh/toolbox.mspx. The starting point for most of us should be http://www.microsoft.com/technet/scriptcenter/topics/winpsh/pshell2.mspx. Get the CTP2 latest release and start playing around!

‘Interesting Events’ from ReadTrace

I’m sure many of you are aware of the wonderful RML Tools (description here and x86 download here) and specifically the ReadTrace tool. Maybe you are also aware of SQLNexus which relies on ReadTrace for the trace analysis.

At some point in time when you use these tools, you would come across the ‘Interesting Events’ table (ReadTrace.tblInterestingEvents in the SQLNexus database) which is populated with various errors and warnings. While ReadTrace comes with a way to view this information at a summary level, one thing it does not easily provide is the ability to tie sample batch text to the ‘interesting event’. Strictly speaking the ReadTrace schema does not provide a ‘by-design’ way to query this information, however best-effort guessing can be done by using the time the ‘interesting event’ was generated and then using it to query tblBatches. Here’s one such query I use to dig for clues to which batches led to such events:

 

SELECT TEC.name, TUB.OrigText, TB1.Cpu
FROM ReadTrace.tblInterestingEvents TIE
JOIN ReadTrace.tblBatches TB1
    ON TIE.SPID = TB1.SPID and TIE.StartTime >= TB1.StartTime AND TB1.EndTime >= TIE.StartTime
JOIN ReadTrace.tblBatches TB2
    ON TB1.BatchSeq = TB2.BatchSeq AND TB1.BatchSeq >= TB2.BatchSeq
JOIN ReadTrace.tblUniqueBatches TUB
    ON TB1.HashID = TUB.HashID
JOIN sys.trace_events TEC
    ON TEC.trace_event_id = TIE.EventID
WHERE name = ‘Hash Warning’

 

Of course, the ‘Hash Warning’ (more information about it available here) event name mentioned above could be replaced with others, such as ‘Sort Warning’, ‘Exception’ etc.

NOT NULL and SWITCH TO PARTITION

Today I was to execute a ALTER TABLE… SWITCH TO PARTITION… operation to SWITCH an unpartitioned table into a partitioned table. I had taken care of the normal documented prerequisites:



  • Same schema for both tables

  • CHECK CONSTRAINT on the unpartitioned table to ensure values which match the destination partition function range

Despite these, I I ran into the following error:


Msg 4982, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. Check constraints of source table ‘foodb.dbo.unparttable’ allow values that are not allowed by range defined by partition 5 on target table ‘foodb.dbo.parttable’.


After some troubleshooting, it was apparent that the issue was due to the following note mentioned in the BOL topic ‘Transferring Data Efficiently by Using Partition Switching’: 


We recommend that you specify NOT NULL on the partitioning column of partitioned tables, and also nonpartitioned tables that are sources or targets for ALTER TABLE…SWITCH operations. By doing this, you make sure that any CHECK constraints on partitioning columns do not have to check for null values. Null values are ordinarily placed in the leftmost partition of a partitioned table. When you are switching any partition other than the leftmost partition and when the ANSI_NULLS database option is set to ON, the absence of a NOT NULL constraint on the source and target tables might interfere with any CHECK constraints that are also defined on the partitioning column.


Little wonder – in my case, I did not have a NOT NULL for the partitioning column. After correcting that, the ALTER TABLE SWITCH TO PARTITION worked just fine!

Exploring LINQ

I just got some time to look into newer things, and one of the most intriguing is LINQ (Language INtegrated Query). I’m just sharing some useful links I came across over a night of browsing for information:



And some interesting posts on performance, which is always dear to my heart:



I’ll post more once I play around more. In case you are keen on exploring, you should download the 90-day evaluation version of Visual Studio.NET 2008 (http://www.microsoft.com/downloads/details.aspx?FamilyID=83C3A1EC-ED72-4A79-8961-25635DB0192B&displaylang=en) and play around!


One last thing which is quite neat is that you can also get the MSDN Library for VS.NET 2008 from http://www.microsoft.com/downloads/details.aspx?FamilyID=6ff3bc60-32c8-4c22-8591-a20bf8dff1a2&DisplayLang=en. Be aware though that both these downloads are very hefty.


Happy New Year 2008!