“Making a square peg fit into a round hole”

There are many times when dealing with data that you need to transfigure one data set to allow for comparison purposes.

SQL has built in functions that easily allow us to transfigure data. There are some considerations when using the functions, most importantly understanding what will happen to the data set during the process.

Today I will talk about the CAST function.

CAST

The T-SQL engine does a relatively excellent job at understanding data types and allowing for the joining of dissimilar types. The method the T-SQL engine uses is explicit and implicit data type conversions. Implicit conversion is the one we are most familiar with and use almost daily. Behind the scenes the T-SQL engine will look at the definition of the fields in the tables and determine if it can convert one field to another data type for comparison purposes. This is the method used when joining two tables where the two fields do not have the same data type but are similar enough that the second field can be converted to the first field’s data type.

An example of this would be converting a field with a data type of integer to a data type of char (character).

The second type of data type conversion is explicit conversion. Where by using the CAST function you instruct the T-SQL engine to transfigure the data type from one form or another.

Building on the previous example mentioned one could write,

CAST(myfieldname AS CHAR)

With the syntax for the CAST function is as follows:

CAST(fieldname AS data_type [(length )])

With the [(length)] part of the function as optional and only required when the data type requires it. For example,

CAST(myfieldname AS DECIMAL(9,3)) will specify conversion to data type decimal with 9 digits in length and 3 digits to the right of the decimal point.

Redundancy?

If the T-SQL engine will automatically use implicit data type conversion, what is the purpose of using explicit conversion?
It is natural to think that the explicit conversion feature is a redundant capability however, there are times when the T-SQL engine doesn’t perform the conversion in the method you would prefer, or we can use the effects of explicit conversion to join tables or write criteria clauses that normally couldn’t be done.

PRACTICAL EXAMPLE

In the SAP database the field mara.matnr (material number) is stored as an 18 character value. The SAP folks know that some companies will use all numeric values and others will use a combination of alpha and numeric characters in their material numbers. So, the SAP software stores the material number as an 18 character value to accommodate as many scenarios as possible. If you have a material number of 100278, in the SAP database the matnr field is stored as 000000000000100278.

With that being said, normally when entering in the material number we would want to ensure the leading zeros have been stripped. However not all tables in the SAP database have this operation completed. And some tables the matnr field is stored with all 18 characters. Now when we try to join a table where the matnr field value = ‘100278’ and the other table has the field value as ‘000000000000100278’, the two rows won’t join. In addition the T-SQL database engine won’t do a data type conversion because each table has the field specified as a NVARCHAR.

What to do? What to do?

Well understanding how the fields data will be transfigured if we CAST the field from NVARCHAR to another data type solves the issue for us. We can CAST the field that has the leading zeros to a FLOAT value.

Explanation

What does the above example do? It will take the field value of ‘000000000000100278’ and transfigure the value to a FLOAT data type. When that happens all of the leading zeros are eliminated because the value ‘000000000000100278’ is the same as 100278. The second cast function will then transfigure the value to an NVARCHAR value which means the field value is change from 100278 to ‘100278’, (Notice the quotes; it is now a string type data type instead of a numeric data type).

Caveats

You should notice three things about the above example. The first item is I am using the CAST function twice. While not absolutely necessary it is a good practice. If I left off the outer CAST function where I CAST the field back to NVARCHAR the T-SQL database engine would do the same, because the m.matnr field data type is NVARCHAR.

The second item to notice is I am performing a function during the INNER JOIN statement. Normally performing a function inside of a join or criteria clause can cause major performance hits, especially if the value needs to be evaluated. However because the CAST function is a columnar function, there is a minimal performance hit. A columnar function means that the T-SQL engine will perform the function against all values in the column before doing the join. It is a bit more complicated than that, but it is the same thing in principle.

The last item to notice is that I am not testing if the field value can be CAST as a float value. The reason I can skip this step is that I understand the data in the table and know there won’t be an issue. Normally you would need to check to make sure all the values in the field can be CAST to the specified data type. Because of what I mentioned in the previous item, the database engine is going to CAST the entire column at the same time. If the T-SQL database engine has a field value it can’t CAST, then a run time error would be generated. An example of this would be if one of the rows had a value of ‘AXT12345’. T-SQL cannot CAST alphanumeric values to a float value, and you will receive the error message “Error converting data type varchar to float.”

Hopefully this helps explains the CAST function and spurs some thoughts on using of the CAST function. If you have specific examples, tips, tricks that utilize the CAST function, let me know.

Happy Mining
–Michael

Extra Info

Below is a chart that shows which data types can be CAST to other data types.

T-SQL Cast Table

T-SQL Cast Table from microsoft.com