This was a “no brainer”. I am intrigued with data compression, particularly in a medical environment (and thus why I co-authored two patents on medical image compression); and I love puzzles. So this (and the next post) will discuss the path I took to figuring out how Cerner compresses BLOBs. If you dunno what a BinaryLargeOBject is, and just want the code, see below but please don’t ask me for support
We are working on several “quality improvement” projects in my day job (isn’t every hospital?) and part of that is to mine (as in extract) the valuable medical insights in the notes — those semi-structured repositories of information in the medical record that are so difficult to organize.
Most of our notes and reports are contained in our Transcription system and Cerner is updated via an HL7 feed from Transcription. However, I found that our Pathology reports were transcribed directly into Cerner and those were “compressed” and pretty much non-readable. Moving forward, we have a way to intercept this interaction; but we had years of notes that we wanted to work with that were stored in the notorious CE_BLOB table (cue the scary music).
This was relatively simple (see previous blog entries) for the other notes; but I ran into the dreaded “Cerner Blob” — the “Billy The Kid” of Healthcare data….a dreaded reputation that may be deserved – but probably not.
I’m an ex-softie (Microsoft alumnus) and so I wanted to do this in C# and .NET so I needed a way to call the Cerner DLL from .NET. Once I had the function signature and found the correct library, the rest was pretty straightforward C#.
For those interested, the uar_ocf_uncompress library is in their shrccluar.dll and if you have licensed this library from Cerner, then you are welcome to use it to decompress the BLOBs. The parameters are the same in the compress and the decompress functions so here are the .NET interop snippets:
[DllImport(@”C:myCernerDirectoryshrccluar.DLL”,EntryPoint = “uar_ocf_uncompress”)]
public extern static int uar_ocf_uncompress( [In, Out, MarshalAs(UnmanagedType.LPArray, SizeParamIndex=1)] Byte Buffer, ref int BufSize,
[In, Out, MarshalAs(UnmanagedType.LPArray, SizeParamIndex=1)] Byte Buffer2, ref int BufSize2, ref int ilen);
[DllImport(@”C:myCernerDirectoryshrccluar.DLL”, EntryPoint = “uar_ocf_compress”)]
public extern static int uar_ocf_compress([In, Out, MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 1)] Byte Buffer, ref int BufSize,
[In, Out, MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 1)] Byte Buffer2, ref int BufSize2, ref int ilen);
So the next step was to try to figure out how to decompress the BLOBs directly from the Oracle database. Fortunately, folks have mentioned in various forums that the compression algorithm that Cerner uses is LZW. In my next Blog post, I’ll talk about how I determined how they implemented the algorithm but the good news is that Cerner didn’t try to hide the data (thank you Cerner). In other words, they did not try to make this difficult to decompress but instead simply deployed a well known Lossless compression algorithm, LZW, using standard procedures.
To test this, I created a simple program that utilized Microsoft LINQ to iterate through a few thousand records from the CE_BLOB table. I first decompressed them using the Cerner API, and then again using the .NET library I created. After decompression using both APIs, I ran a string comparison. For these sample records, the final strings all compared successfully.
That’s it for now. In the next post, I’ll discuss the steps to develop the stand-alone solution and talk a bit more about how to use the library.