In a recent post, I talked about how SQL Server’s Full-Text-Search might be a rather simple solution to find health care data in progress notes and other “blobs” of semi-structured data. I decided to build a working prototype and this blog will discuss how it was constructed and what our next steps will be.
Fortunately, we have a transcription department so all of our notes are saved as ASCII text. However, for my personal health information at home, I’ve scanned the documents I received from my health providers and saved the images as TIFF format. SQL will use the Windows’ built-in TIFF OCR engine to create an index on the words included in these TIFF (or PDF) documents.
The first small challenge was that our notes are saved in RTF. I wanted to be able to display the notes on a web page and without requiring an ActiveX plug in to display the RTF. So how to convert the RTF to plain text?
I found the easiest (and free) way to do this conversion was to use the Rich-Text-Box that is included with .NET and Visual Studio. I feed the text as rtf into the control and then immediately read it out as text and the control handles the conversion.
RichTextBox rtb = new RichTextBox(); rtb.Rtf = my RTF from the feed; myDatabase = rtp.Text;
Our content still contained some HL7 markup that I needed to “fix” and I needed to change the carriage return and line feeds to their HTML equivalents, but this was all easy to do so that the final results looked very similar to the original text with the rtf.
I next built a single .html web page using jquery for the user interface that will collect the search terms and allow the searcher to limit the results by either Visit Id or MRN.
When they press the Search button, the application will make a web service call to the SQL Server and return a list of MRNs (hidden below), with the size of the Note, and the message time. A user can hover over the link in the “View Note” column to see the first line of the note; or, press the link to get a dialog pop-up where the search term is highlighted.
So a single html page wired up to a SQL Server with Full Text Search enabled allows our clinicians to search now across over 26 million records. And with the incredible performance of the Full-Text-Index, the results come back almost instantly.