‘Cannot resolve the collation conflict’ error message with temp tables

Scenario

Today my customer told me that he is facing an error message ‘Cannot resolve the collation conflict’ when executing an JOIN between two tables: a base table and a temporary table. They had recently created the database afresh from scripts and were facing this issue since. On the existing ‘good’ server the problem did not occur.

The only obvious difference was that the ‘good’ database and ‘bad’ database differed in collations.

It is a fairly common problem and one which has a simple solution, but it does need to be handled in T-SQL code to ensure that it works in all possible scenarios. Here is a simple reproduction of the issue.

Walkthrough

Firstly, let us check the SQL instance collation:

select SERVERPROPERTY(‘Collation’)
go

Case 1: Temporary tables without any specific column level collation specifier

It tells us that the instance is at the default: SQL_Latin1_General_CP1_CI_AS. Next, let us simulate the ‘bad’ database, which had a different collation from the instance collation, and create a table in that database.

CREATE DATABASE Ukrainian COLLATE SQL_Ukrainian_CP1251_CS_AS
GO

USE Ukrainian
GO

CREATE TABLE MyTab
(
    someval VARCHAR(50)
)
GO

SELECT collation_name
FROM sys.columns
WHERE name = ‘someval’
and object_id = object_id(‘MyTab’)
GO

As you can see, the column inherits the database collation, namely SQL_Ukrainian_CP1251_CS_AS. Next, let us simulate the actual issue:

CREATE TABLE #something
(
    somevaltemp varchar(50)
)
GO

SELECT *
FROM MyTab JOIN #something
ON someval = somevaltemp
GO

The error is reproduced:

Msg 468, Level 16, State 9, Line 3
Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "SQL_Ukrainian_CP1251_CS_AS" in the equal to operation.

This is interesting because tables created in TEMPDB inherit the collation of TEMPDB unless the collation is explicitly specified. A quick check from sys.columns in TEMPDB will confirm it:

SELECT collation_name, name
FROM tempdb.sys.columns
WHERE name = ‘somevaltemp’
GO

The result is the same as the instance / TEMPDB collation: SQL_Latin1_General_CP1_CI_AS.

Case 2: Temporary tables with collation specified for the column

Next, let us try to create the temporary table with a collation value identical to the database from which it is created:

CREATE TABLE #somethingexplicit
(
    somevaltempexplicit varchar(50) COLLATE DATABASE_DEFAULT
)
GO

We can also confirm the collation of the newly created column:

SELECT collation_name, name
FROM tempdb.sys.columns
WHERE name = ‘somevaltempexplicit’
GO

As you will see it is now SQL_Ukrainian_CP1251_CS_AS. Let us attempt the JOIN again this time against the new temporary table:

SELECT *
FROM MyTab JOIN #somethingexplicit
ON someval = somevaltempexplicit
GO

You will see that it succeeds without any issues.

Case 3: Temporary tables generated by a SELECT…INTO

Let us try a final test, this time with a SELECT INTO a temporary table:

SELECT someval AS somenewvalinto
INTO #tempselectinto
FROM MyTab

SELECT collation_name, name
FROM tempdb.sys.columns
WHERE name = ‘somenewvalinto’
GO

As you can predict, the columns of the temporary table inherit the collation from the base table which we are selecting from. So the collation is retained as SQL_Ukrainian_CP1251_CS_AS.

SELECT *
FROM MyTab JOIN #tempselectinto
ON someval = somenewvalinto
GO

The JOIN also succeeds in the final case.

Conclusion

If you are creating a temporary table from your database code, and if your database happens to use an explicit collation, it is your DUTY to specify the collation in the CREATE TABLE call for the temporary table. You can either explicitly specify the collation value or use DATABASE_DEFAULT clause.

Advertisement

Using the TransactSql.ScriptDOM parser to get statement counts

Today there was a question on the #sqlhelp Twitter hashtag: “Is it possible to get UPDATE / SELECT / INSERT / DELETE statement counts from SQL Server?”

Implementation

This is a perfect use case for the SQLDOM parser a.k.a. Microsoft.SqlServer.TransactSql.ScriptDom. I put together a sample C# application to demonstrate how easy it is for requirements like the above. The assumption made below (for simplicity) is to parse the input from a .SQL script file. In the real world, we can easily script the object code from a live connection (using SMO for example) and pass it as a MemoryStream to the TextReader below. But that is left, as they say, to the reader as an intellectual exercise Smile

Test case

Here is the sample code, and interestingly if you run it on a script of AdventureWorks, it tells you that there is not a single DELETE statement embedded in any of the sample database’s Stored Procedures, Views or Functions (which is correct and it is a minor eye-opener as well Smile).

Found 37 SELECTs, 1743 INSERTs, 15 UPDATEs & 0 DELETEs

Conclusion

The ScriptDom excels at simple tasks like this, but do keep in mind some complications which need to be considered for more advanced requirements. I’ll be posting some tips on how to overcome those complications later on.

Sample Code
using System; 
using System.Collections.Generic; 
using System.Text; 
using Microsoft.SqlServer.TransactSql.ScriptDom; 
using System.IO;
namespace SampleSQLVisitor 
{ 
    class Program 
    { 
        static void Main(string[] args) 
        { 
            TextReader txtRdr = new StreamReader("myscriptfile.sql"); 
            TSql110Parser parser = new TSql110Parser(true);
            IList<ParseError> errors; 
            TSqlFragment sqlFragment = parser.Parse(txtRdr, out errors); 
             // TODO report the parsing errors generated (if any)
            SQLVisitor myVisitor = new SQLVisitor(); 
            sqlFragment.Accept(myVisitor);
            myVisitor.DumpStatistics(); 
        } 
    }
    internal class SQLVisitor : TSqlFragmentVisitor 
    { 
        private int SELECTcount = 0; 
        private int INSERTcount = 0; 
        private int UPDATEcount = 0; 
        private int DELETEcount = 0;
        private string GetNodeTokenText(TSqlFragment fragment) 
        { 
            StringBuilder tokenText = new StringBuilder(); 
            for (int counter = fragment.FirstTokenIndex; counter <= fragment.LastTokenIndex; counter++) 
            { 
                tokenText.Append(fragment.ScriptTokenStream[counter].Text); 
            }
            return tokenText.ToString(); 
        }
        // SELECTs 
        public override void ExplicitVisit(SelectStatement node) 
        { 
            //Console.WriteLine("found SELECT statement with text: " + GetNodeTokenText(node)); 
            SELECTcount++; 
        }
        // INSERTs 
        public override void ExplicitVisit(InsertStatement node) 
        { 
            INSERTcount++; 
        }
        // UPDATEs 
        public override void ExplicitVisit(UpdateStatement node) 
        { 
            UPDATEcount++; 
        }
        // DELETEs 
        public override void ExplicitVisit(DeleteStatement node) 
        { 
            DELETEcount++; 
        }
        public void DumpStatistics() 
        { 
            Console.WriteLine(string.Format("Found {0} SELECTs, {1} INSERTs, {2} UPDATEs & {3} DELETEs", 
                this.SELECTcount, 
                this.INSERTcount, 
                this.UPDATEcount, 
                this.DELETEcount)); 
        } 
    } 
} 
Disclaimer

This Sample Code is provided for the purpose of illustration only and is not intended to be used in a production environment.  THIS SAMPLE CODE AND ANY RELATED INFORMATION ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE.  We grant You a nonexclusive, royalty-free right to use and modify the Sample Code and to reproduce and distribute the object code form of the Sample Code, provided that You agree: (i) to not use Our name, logo, or trademarks to market Your software product in which the Sample Code is embedded; (ii) to include a valid copyright notice on Your software product in which the Sample Code is embedded; and (iii) to indemnify, hold harmless, and defend Us and Our suppliers from and against any claims or lawsuits, including attorneys’ fees, that arise or result from the use or distribution of the Sample Code.

This posting is provided “AS IS” with no warranties, and confers no rights.

TechEd India 2013 – ‘T-SQL Horrors’ slides

TechEd 2013 was a grand success! Thank you – those of you who stayed till 6:15PM at the Pune session – and even more to those who engaged me in Q&A till 7PM that evening Smile I was very impressed and happy to see people interested in my talk, even though it was at the end of a very long day for most.

As promised earlier I am publishing my content deck, with the sample scripts for all the scenarios which we discussed. All the examples will execute on AdventureWorks (2005/2008 or 2012 versions) and should give you ample practice on the concepts we discussed.

And please – if you liked the session, please do leave a comment, either here on the blog or on my Twitter account. That’s pretty much the best way you can say thanks Smile

Considerations when using the TransactSql.ScriptDOM parsers

Some of you might be aware of the above namespace, which holds an implementation of a first-class T-SQL parser. In this post I would like to explain some of the complexity you will face when dealing with the ScriptDOM yourselves, typically using Visitor pattern.

Case Study

Our objective in this case is to use the parser and implement a rule to find expressions which have a leading wildcard in them, within a WHERE Clause. Firstly, let us consider the T-SQL statement below:

SELECT * FROM Person.Contact
WHERE LastName LIKE ‘%mith’

If you parse this using ScriptDOM you can visit the LikePredicate element, and you can then typecast the SecondExpression to StringLiteral and then check if it starts with %.

public override void ExplicitVisit(LikePredicate node)
        {
            if (node.SecondExpression is StringLiteral)
            {
                if ((node.SecondExpression as StringLiteral).Value.StartsWith("%"))
                {
                        Console.WriteLine((node.SecondExpression as StringLiteral).Value);
                }
            }
        }

Now consider this second case, which has a complex expression (string concatenation)

SELECT * FROM Person.Contact
WHERE LastName LIKE ‘%’ + @blah + ‘%’

In this case, if you try to reuse the ExplicitVisit code mentioned above, it will fail to detect the issue. Because, the SecondExpression member of the LikePredicate is now a BinaryExpression. This complicates the check:

if (node.SecondExpression is BinaryExpression)
            {
               if ((node.SecondExpression as BinaryExpression).SecondExpression is StringLiteral)
                {
                   if ((((node.SecondExpression as BinaryExpression).SecondExpression) as StringLiteral).Value.StartsWith("%"))
                    {
                          Console.WriteLine((((node.SecondExpression as BinaryExpression).SecondExpression) as StringLiteral).Value);
                    }
                }
            }

Conclusion

For arbitrary AST shapes, you can see that this is a very difficult issue to resolve in code. So while the ScriptDOM parser is a great tool, it does requires a fundamentally different approach to navigating the AST. Something to be kept in mind if you are attempting to use the parser in any industrial strength application.

Named constraints: two sides to the story!

Background

Constraints in SQL Server are of the following types:

  • CHECK constraints
  • DEFAULT constraints
  • Foreign key constraints
  • NULLable constraint
  • UNIQUE constraint
  • PRIMARY KEY constraint

BTW, if you are wondering what a NULLable constraint is, it is the formal representation of a NULL / NOT NULL definition for a field. We don’t normally think of NULL definitions in that way but in the parser that is how it is represented Smile

Anyways, constraints such as CHECK, DEFAULT, UNIQUE, PRIMARY KEY, FOREIGN KEY can be named, for example:

CREATE TABLE #TableWithNamedConstraints
(
    i int not null constraint PK_TableWithNamedConstraints primary key,
    j int constraint DF_j default 100,
    k int constraint CK_k check (k > 0)
)

The issue

While this is generally considered to be a good practice for base tables, for a temp table such as the above, it can be a real problem if you have multiple connections executing the above CREATE TABLE code at the same time. The problem arises from the fact that while the temp table name is uniquified, the constraint names are not. In such cases of concurrent execution, you typically receive an error such as the below:

Msg 2714, Level 16, State 5, Line 1
There is already an object named ‘PK_TableWithNamedConstraints’ in the database.
Msg 1750, Level 16, State 0, Line 1
Could not create constraint. See previous errors.

You can see the evidence of this in sys.objects:

select name from sys.objects
where parent_object_id = object_id(‘#TableWithNamedConstraints’)
or object_id = object_id(‘#TableWithNamedConstraints’)

The output shows that the table name is uniquified but the constraint names are not:

PK_TableWithNamedConstraints
DF_j
CK_k
#TableWithNamedConstraints_________…_____00000000001D

Workaround

This issue is described in this Connect bug as well. The workaround is therefore to not name constraints in the case of temporary tables. So for the example shown above, the more ‘robust’ version (for the temp table case) would look like this:

CREATE TABLE #TableWithNamedConstraints
(
    i int not null primary key,
    j int default 100,
    k int check (k > 0)
)

Conclusion

So the next time you have a CREATE TABLE with constraints, consider the two cases:

  • For base tables, you almost ALWAYS want to name them
  • For temporary tables, you almost NEVER want to name them

Hope this helps!

Opinion Poll: Are PRINT statements considered harmful?

Today during a discussion a point came up around the role of PRINT statements in production code. While most data access today is routed through a data access layer (typically .NET or JDBC) and is focussed on consuming result sets (or executing U/I/D nonquery stuff) we were wondering on what you use PRINT statements for. In the long past, I would have said that the print statement is probably the best way for debugging, but in today’s world with easy access to the T-SQL debugger, a developer is probably much better off without PRINT.

With SqlClient, we consume the PRINT events using the InfoMessage event, but in practice I wonder how many of you actually rely on that. Please comment on this post to share your experiences with PRINT and if you think it is good / bad / evil in today’s world 🙂

t-SQL Anti-Pattern: Index Key Order and Query Expression Order

This is really not a T-SQL anti-pattern as much as it is a database design issue, but we see it so often that it’s worthwhile bringing it up and clarifying things.

For illustrating the scenario, let’s examine the table Person.Contact in the AdventureWorks database. It has 2 columns called FirstName and LastName. Let’s say an application frequently queries this table with these columns in the WHERE clause. The query looks like this:

SELECT ContactID from Person.Contact

WHERE FirstName = ‘Carla’ and LastName = ‘Adams’

 

In order to support the query for seeking to the data, we create this index:

create nonclustered index idx_contact_names on Person.Contact(FirstName, LastName)

 

Now, let’s say there’s another application which fires another query on this table, and that query looks like this:

SELECT ContactID from Person.Contact

WHERE LastName = ‘Alberts’ and FirstName = ‘Amy’

 

Notice the difference between the 2 queries: the predicate ordering in the expression is different. Now, for the problem: some developers will now create another index, with the column order as (LastName, FirstName). That is not required. If you view the execution plan for both the queries, you will notice that the index is being used!

 

image

If you end up creating 2 indexes for the above scenario, SQL Server will effectively use only one of them for queries such as the above. The other index will only add to the overhead of index maintenance required during DML operations. So it is a redundant index and as such should be dropped.

 

Conclusion

The predicate order is independent of the choice of index to be created / the choice of index being used. SQL Server will use the index which yields a good plan.

T-SQL Anti-pattern of the day: ‘all-in-one’ queries

Scenario

A common requirement for enquiry queries on an OLTP database is to have search criteria which are very specific (‘get me details for for OrderID = NNNN’) and also the occasional reports which ask for all the orders (‘get me all the orders, no questions asked’.) Here is a sample from AdventureWorks which illustrates the problem:

CREATE PROCEDURE RptOrder(@OrderID int)
AS
BEGIN
    SELECT *
    FROM Sales.SalesOrderHeader
    WHERE (SalesOrderID = @OrderID OR @OrderID IS NULL)
END

What is the meaning of the underlined predicate in the above WHERE clause? It is actually a ‘special case’ where the developer intends to get back all the rows, regardless of the OrderID. This ‘special case’ is triggered by passing in a value of NULL for the @OrderID parameter.

Problem

So while this construct looks good in theory, it lends itself to very poor performance. Take a look at the 2 cases where this procedure is executed.

Case A: with specific OrderID

EXEC RptOrder 43672

Case B: asking for all records

EXEC RptOrder NULL

The plan, it turns out, is the same for both cases and a scan is used! This is despite a seekable index being present on SalesOrderID column for the SalesOrderHeader table:

image

The reason the optimizer chooses to scan the SalesOrderHeader (in this case it chooses a non-clustered index scan) is because it has no way to determine at compile and optimization time, as to what the specific value of @OrderID would be. Hence it has no way to ‘fold’ the (@OrderID IS NULL) expression and therefore has no option but to look at all the records.

Workarounds

‘IF-ELSE’ Workaround: The straightforward workaround in simple cases like the one above is to separate out the 2 cases into an IF-ELSE block:

ALTER PROCEDURE RptOrder(@OrderID int)
AS
BEGIN
    IF (@OrderID IS NOT NULL)
    BEGIN
        SELECT *
        FROM Sales.SalesOrderHeader
        WHERE (SalesOrderID = @OrderID)
    END
    ELSE
    BEGIN
        SELECT *
        FROM Sales.SalesOrderHeader
    END
END

Now, the 2 test cases work as expected. Here are the execution plans:

EXEC RptOrder 43672

image

EXEC RptOrder NULL

image

Dynamic SQL Workaround: However, as the number of predicates in the WHERE clause increase, and if all those predicates (or most of them) have such ‘catch-all’ handling then the IF – ELSE construct becomes unviable. In those cases, a dynamic SQL construct should be considered. Of course, when dealing with dynamic SQL, we must consider security first, including the possibility of SQL Injection and also the Execution Context of the dynamic SQL statement. But that is a topic for another post. Right now, here is how we could handle something like that:

— NOTE: This code is highly simplified and does not provide for any screening

— or protection against SQL injection!!! Provided as-is, confers no warranties.

ALTER PROCEDURE RptOrder(@OrderID int)
AS
BEGIN
    DECLARE @sDynamicSQL nvarchar(4000)
    SELECT @sDynamicSQL = ‘SELECT * FROM Sales.SalesOrderHeader ‘

    IF (@OrderID IS NOT NULL)
    BEGIN
        SELECT @sDynamicSQL = @sDynamicSQL + ‘ WHERE (SalesOrderID = @OrderID)’
    END

    EXEC sp_executesql @sDynamicSQL, N’@OrderID int’, @OrderID = @OrderID
END

Different Code Paths: The cleanest way of course is to consider having separate procedures for each kind of query. For example we can have a procedure called RptSpecificOrder for the case where we are searching by specific OrderID, and another one called RptAllOrders for the ‘get-me-everything’ case. This does have the advantage of clean isolation, but it does not scale easily when the number of predicates are larger. But is does also have the advantage that if we are querying for specific orders 99% of the time, that code path is simplified and optimized accordingly.

Conclusion

Beware of this T-SQL anti-pattern as it is one of the most common ones we see and it does have a huge (negative) impact on query performance. As you can see, if they are not done with these patterns in mind, application design and reporting requirements can have a detrimental effect on OLTP query execution. Separating reporting and OLTP workloads could be the key to solving these kinds of issues. But if separation is not possible, then clever use of separate code paths and stored procedures could help ensure that the most efficient execution plan is selected for each case. For complex queries, dynamic SQL may offer the simplest way out, but due care has to be taken to ensure that permissions and SQL injection issues are kept in mind when dealing with dynamic SQL statements.

T-SQL Anti-pattern of the day: comparing DATETIME field with date-only literal

Scenario

It is a pretty common situation to have transaction date-time stored in a DATETIME field. The problems start with the fact most applications used GETDATE() or some such equivalent at the client side to record the order date-time stamp. So a typical entry for an OrderDate would actually end up with a time component as well. In versions prior to SQL Server 2008, there was only this option – no specific options like the DATE data type.

Problem

While INSERT code worked pretty adequately in this case, the problem starts with running reports of some kind. Most reports take parameters with the date (or date range) for which the reports are required. The (anti-)pattern around the WHERE clause predicates in a typical report query (based on the AdventureWorks database) would look like this:

SELECT *
FROM Sales.SalesOrderHeader
WHERE CONVERT (varchar(10), OrderDate, 112) = ‘20040731’

OR

SELECT *
FROM Sales.SalesOrderHeader
WHERE OrderDate LIKE ’31 Jul 2004%’

Both these predicates are non-optimal for usage of any index (and hopefully it would be a clustered index, to support range scans efficiently) on OrderDate:

  • The first query fails to use an index (and hence scans) because it explicitly converts the OrderDate field to a ISO-format date (varchar) in order to perform the comparison:

image

  • The second query has an implicit conversion to a varchar data type and hence again fails to use any indexes on OrderDate:

image

Workaround

From a design perspective there are potentially 2 problems rolled into 1 in this scenario:

  • The fact that we are storing date and time values in the same datetime field, which is required for an OLTP system
  • The fact that the report is running on the same database as the OLTP system

In many cases the 2nd aspect (reporting and OLTP on the same database) is unavoidable due to hardware constraints. In such cases, the possible workarounds could be:

Option 1: Use explicit timestamps in the literal portion of the predicate in the WHERE clause

This one is perhaps the most practical workaround for most workloads. Here the query is re-written as follows:

SELECT *
FROM Sales.SalesOrderHeader
WHERE OrderDate BETWEEN ’31 Jul 2004 00:00:00′ AND ’31 Jul 2004 23:59:59:997′

The ensuing query plan looks more efficient. Of course the cost of the key lookup is due to the ‘SELECT *’ usage, and that is something which can be overcome by either:

  • Specifying only the relevant fields in the output list OR
  • Considering a clustered index on OrderDate (this probably makes sense in a reporting-only database)

Another thing to note is the usage of 997 milliseconds in the above query. Why not 999 milliseconds? Because then SQL Server would round it up to ‘1 Aug 2004 00:00:00’. Check this linked KB article for more details.

image

Option 2: Store only the date portion

If the time portion of the order timestamp is not important, have the application INSERT the CONVERTed varchar (or better still, perform that conversion in the application) and store only the date time portion. If you are in a position to upgrade to SQL Server 2008 then the DATE datatype may be of interest here.

Option 3: Have separate fields for datetime and date portion

Modify the database schema to have separate fields for the OrderDateTime and OrderDate. Modify the INSERT code to insert the raw date-timestamp into OrderDateTime, and the a CONVERTed date portion into the OrderDate field. Use the OrderDate field for reports and OrderDateTime for other queries which need it.

Conclusion

Separating reporting from OLTP processing is the key design aspect to always be kept in mind. In situations where that is not possible, some refactoring of the query can help. Table denormalization, coupled with minor changes in the application can also help in some situations to optimize query execution.

There may be other innovative solutions to the problem, and I am eager to hear from any readers about those comments!

T-SQL Anti-pattern of the day: UPPER() and LOWER()

This one is even more common and therefore has a high impact: the usage of the above functions in WHERE clause predicates. I’ve examined each in some detail below.

Here’s a typical example I see, based off the AdventureWorks database.

Scenario

To illustrate the effect of this example, I have added an extra index on the FirstName column in Person.Contact table:

CREATE NONCLUSTERED INDEX NC_Person_Contact_FirstName
ON Person.Contact(FirstName)

Here’s a typical query pattern coming in from the client:

DECLARE @FirstName nvarchar(50)
SELECT @FirstName = ‘pilar’

SELECT *
FROM Person.Contact
WHERE UPPER(FirstName) = UPPER(@FirstName)

Problem

Let’s take a look at the execution plan for this batch:

image

As expected there is a scan happening, due to the presence of the UPPER() on the FirstName column. Interestingly the optimizer chose to scan the non-clustered index rather than the clustered one. It did that because the cost of scanning the narrower index on FirstName is cheaper than the clustered index. Anyway, it is still far from optimal.

Workarounds

  • If your server / database / column is set to use a case insensitive collation (if you don’t know what a collation is, I suggest you start with this Books Online entry first) then you really do not need any of the UPPER() or LOWER() function calls, because in that case, the execution engine will perform a case-insensitive comparison. Most customers I work with, use the US English installation and the default collation of SQL_Latin1_General_CP1_CI_AS. For those customers, this workaround would be safe.

Is it a good one by design? The answer as with many things in SQL Server, is it depends. If your installation procedures are standardized and repeatable, and if you are using the default collation (or any other case-insensitive one) then this workaround may very well turn into a by-design solution. In other cases, this workaround may not solve the problem.

  • A more robust solution would be to transform (in this case, to uppercase) the values being stored in these tables (in our case, FirstName) at INSERT time. That way, the predicate in the SELECT query can change to the following:

FirstName = UPPER(@FirstName)

Conclusion

Careful thought needs to go into design and development of databases where strings are being compared. Using a case-insensitive collation is sometimes the best way to avoid some of the problems described above. However if you have a case-sensitive collation, performing these operations up front in client code at INSERT time is perhaps the best way to negate the performance side effects of using the UPPER() / LOWER() predicates in WHERE clauses.