Pages

Monday, June 10, 2013

collation issues sql server


In previous article, we learnt about collation in SQL Server

In this article, we will discuss some of the common issues that encountered with collation.
Collation issues arises with string type of datataypes like char, nchar, varchar, nvarchar, text etc...
Common scenerio of collation issues
  • columns with different collations in where/on condition in a query
  • columns with different collations in union/union all
Lets take examples to understand the issues described above

create table tab1 (val varchar(10) collate Latin1_General_100_CI_AS)
create table tab2 (val varchar(10) collate Latin1_General_100_CS_AS)
insert into tab1
select 'ABC' union all
select 'DEF'
insert into tab2
select 'ABC'

Lets run a query with different collations with "on" condition

select    * 
from    tab1 t1
inner join tab2 t2 on t1.val = t2.val
OUTPUT

Lets run a query with different collations with union

select val from tab1
union all
select val from tab2
OUTPUT

Cause    : We are getting this error because collation of the columns used in the comparing condition are different.
Solution : In order to resolve the issue, we need to run the query with the same collation for both the columns. The simplest and easiest solution to resolve the same is to set the collation of both the columns as of database

select    * 
from    tab1 t1
inner join tab2 t2 on t1.val collate database_default = t2.val collate database_default

select val collate database_default from tab1
union all
select val collate database_default from tab2
OUTPUT

collation in sql server


Collation refers to a set of rules that determine how data is sorted and compared. Character data is sorted using rules that define the correct character sequence, with options for specifying case-sensitivity, accent marks, kana character types, and character width.

Types of collation sensitivity

  • Case Sensitivity
  • Accent Sensitivity
  • Kana Sensitivity
  • Width Sensitivity

Case Sensitivity

If A and aB and b, etc. are treated in the same way, then it is case-insensitive. 
A computer treats A and a differently because it uses ASCII code to differentiate the input. The ASCII value of A is 65, while a is 97. The ASCII value of B is 66 and b is 98.
  • CI specifies Case Insensitive 
  • CS specifies Case Sensitive
Accent Sensitivity
If a and Ao and O are treated in the same way, then it is accent-insensitive. A computer treats a and A differently because it uses ASCII code for differentiating the input. The ASCII value of a is 97 and A 225. The ASCII value of o is 111 and O is 243.
  • AI specifies Accent Insensitive 
  • AS specifies Accent Sensitive
Kana Sensitivity
When Japanese kana characters Hiragana and Katakana are treated differently, it is called Kana sensitive.

Width Sensitivity

When a single-byte character (half-width) and the same character when represented as a double-byte character (full-width) are treated differently then it is width sensitive.

Query to check collation and compatibility level of a database

SELECT compatibility_level, collation_name
FROM   sys.databases
WHERE  name = 'DatabaseName'

Query to find out all available collations in SQL Server database

SELECT    [name] 
        , COLLATIONPROPERTY([name], 'CodePage') AS [CodePage]
        , COLLATIONPROPERTY([name], 'LCID') AS [LCID]
        , COLLATIONPROPERTY([name], 'ComparisonStyle') AS [ComparisonStyle]
        , COLLATIONPROPERTY([name], 'Version') AS [Version]
        , [description]
FROM    fn_helpcollations()

รจ In the next article, we will explore common issues encountered with collation