While generating the codes was a pretty straightforward matter of
=left(), I discovered only after exporting from Excel 2010 to a tab-separated text format that Excel or Word had converted my ASCII dash (-) and ellipses (…) into special dash (–) and ellipses (…) characters somewhere along the way.
I was using Notepad++ to review the file generated and I decided to determine just what characters existed in the CDC-provided data. I built up the following PCRE regex one piece at a time until I had the exact list (case insensitive):
Of course, upon reflection, I just realized that J has a great way to find the characters:
|/:~ ~. 1!:1 <'C:/Path/ICD10/icd10withHierarchy.txt'|
Right after the command is the result, which starts with a tab, line feed, carriage return, and then a space before the percent sign (%).
1!:1 <'filename' is just the notation for reading a file.
~. is the Nub verb, which removes all duplicates from an array (of which a string is a kind).
/:~ determines the sort order for an array and applies it to itself.