TSQL PATINDEX()

If you are not aware, there is a string manipulation function available called PATINDEX().  The function is similar to CHARINDEX() in that it will allow you to find the index of a particular character in a string, however PATINDEX() takes it one step further and allows you to search by a regular expression pattern.  I have found this ability extremely useful several times where I had to pare strings that were similar in construction yet contained different prefix and suffixes.  Case in point, I had a list of 500 medications. Each record held the name of the med along with the dosage information attached to the end.  Some of the meds also had anidentifier appended to the front, either @-, X-, or *-, to mark them as belonging to a special category.  My goal was to strip the records of these identifiers, if present, as well as the dosage information on the end, and return a distinct list.  My process was not pretty, and probably not the most efficient solution, but making good use of PATINDEX() and CASE statements, I was able to come up with the following solution:
 

SELECT DISTINCT SUBSTRING(somestuff,

                CASE

                                WHEN PATINDEX(‘%X-%’, somestuff) > 0

                                THEN 3

                                ELSE PATINDEX(‘%[A-Z]%’, somestuff)

                END,

                CASE

                                WHEN PATINDEX(‘%[0-9]%’, somestuff) = 0

                                THEN LEN(somestuff)

                                ELSE PATINDEX(‘%[0-9]%’, somestuff) –

                                                CASE

                                                                WHEN PATINDEX(‘%X-%’, somestuff) > 0

                                                                THEN 3

                                                                ELSE PATINDEX(‘%[A-Z]%’, somestuff)

                                                END

                END)

 

 

Note the extra use of PATINDEX(‘%X-%’, somestuff).  This was necessary to remove the prefix indentifier "X-" that was not removed by the use of PATINDEX(‘%[A-Z]%’, somestuff).  The other identifiers were (@ and *).  There is probably a way to do this easily with a better RegEx, but I am definitely an amateur in that arena. 

Hope this info helps.  Happy Coding!

 

Leave a comment