ICD 10 code hierarchy, over-helpful Excel, and Jsoftware.com

I recently generated the parent-child and ancestor-progeny data for the ICD-10 diagnostic codes. You can find them on the GitHub repo I set up.

While generating the codes was a pretty straightforward matter of =vlookup() and =left(), I discovered only after exporting from Excel 2010 to a tab-separated text format that Excel or Word had converted my ASCII dash (-) and ellipses (…) into special dash (–) and ellipses (…) characters somewhere along the way.

I was using Notepad++ to review the file generated and I decided to determine just what characters existed in the CDC-provided data. I built up the following PCRE regex one piece at a time until I had the exact list (case insensitive):

[^-0-9A-Z \t\r\n.,\[\]/()'%<>=+%]

Of course, upon reflection, I just realized that J has a great way to find the characters:

/:~ ~. 1!:1 <'C:/Path/ICD10/icd10withHierarchy.txt'
%&'()+,-./0123456789<=>ABCDEFGHIJKLMNOPQRSTUVWXYZ[]abcdefghijklmnopqrstuvwxyz

view raw
FindNubWithJ.ijs
hosted with ❤ by GitHub

Right after the command is the result, which starts with a tab, line feed, carriage return, and then a space before the percent sign (%).

1!:1 <'filename' is just the notation for reading a file. ~. is the Nub verb, which removes all duplicates from an array (of which a string is a kind). /:~ determines the sort order for an array and applies it to itself.

ICD 10 code hierarchy, over-helpful Excel, and Jsoftware.com

Toolbelt

Slickrun (http://www.bayden.com/slickrun/)

AutoHotKey (http://ahkscript.org/)

Notepad++ & PCRE (Regex) & Python (https://notepad-plus-plus.org/)

MS Excel & VBA

SQL

J (http://jsoftware.com/)

Chrome and Firefox developer tools

TeamViewer (https://www.teamviewer.com/)

Greenshot (http://getgreenshot.org/)

Toolbelt