“Talk about reaching over your shoulder to scratch the back of your knee” — Dave M.

My former co-worker Dave was full of such colorful phrases, and this was one of his favorites. He would use the phrase anytime he saw someone using an inefficient method to accomplish a task.

In this vein, I am going to attempt to share some tips, tricks, and helpful hints about how to accomplish things quicker and easier. Some of you out there may be already aware of these tips, but others may not. Also, if you have some hints to share send them along, so everyone can benefit.

How to Collate

When I first joined Coca-Cola I started to learn about all of the wonderful idiosyncrasies of BASIS. BASIS is an ERP system created before anyone knew what an ERP system was. BASIS uses flat files to store its data. I think BASIS was designed for the IBM 360 system, the software was created long before I ever joined the organization.  And I observed a wide variety of methods for dealing with BASIS’s idiosyncrasies. BASIS has a finite number of codes available for different fields. Many times the field length is only a single character. And because BASIS limits input to numbers and upper and lower case Latin alphabet, it gives a max number of values as 62. (10 numbers + 26 upper case letters + 26 lower case letters)

Joy oh Joy. 

On a nightly basis, the extraction process takes a snapshot of the BASIS tables and moves them into the  Microsoft SQL server. When the Microsoft SQL database was setup the tables were designated as case insensitive. Case insensitive means if you search for a value with the SQL statement

WHERE am01_artgrp6 = ‘b’

the results would actually return rows where

am01_artgrp6 = ‘b’

or

am01_artgrp6 = ‘B’

In the BASIS table ‘b’is very different from ‘B’. So the question is, how do we write a query to only get am01_artgrp6 = ‘b’?

Effective use of Collate

I have seen a couple of different ways being used but all of these methods ignored SQL’s built in mechanism for handling the problem.

T-SQL supports the SQL statement COLLATE.

The COLLATE statement allows you to override the default collation of the table. The tables on our SQL server were set up as case insensitive. When you use the COLLATE statement you can tell the SQL server that in your query your SQL parameters are case sensitive. By making your parameters case sensitive, now you can return results only where am01_artgrp6 = ‘b’.

Example:

 

Explanation:

You can see I am doing a simple count query and I am going to group on am01_artgrp6.

By adding the COLLATE SQL_Latin1_General_Cp1251_CS_AS statement I am overriding the default collation. When I override the default collation, I then need to tell the SQL server what collation I would like to use. In the example I want to use a case sensitive collation, based on the Latin alphabet.

In this case I want to stay with Latin 1 general,

CP1251 – Microsoft windows implementation of Latin 1 alphabet

CS – Case sensitive

AS – Alpha sort.

Extra Info

Microsoft SQL has dozens of collations defined for different languages, non-Latin based character sets, and right to left languages. Below is a quick query that lists all of the collations installed on the server with its corresponding description.

 

In the future, if you have to work with a table that is marked as case insensitive, however you do care about the case sensitivity the above code samples should make your code simpler, and your queries run faster.