Saturday, December 18, 2010

My Experiences with HL7–Part 1

When I started working with hospital data, it was clear that HL7 solved a number of data conveyance issues by specifying a lingua franca for health data. In this post, I will provide a brief introduction to how HL7 is used to convey these data, discuss the two primary dialects, and introduce some of the challenges we have had in working with HL7 data. In a future post I plan to introduce a useful tool I wrote (and will release as freeware) that allows us to parse Hl7 data and store it in an Entity-Attribute-Value database for data validation

Our friends over at Wikipedia, have a pretty good introduction to HL7. A frequent use of HL7 messages is to communicate information from one Hospital Information System to another. For example, the Admission-Discharge-Transfer (“ADT”) system might be in EPIC and the Clinical Information System may be in Cerner. When a patient is admitted, the Cerner system would need to know about it and so Epic would have an outbound HL7 interface that would send HL7 messages to the inbound Cerner system.

Generally, US hospitals use the earlier 2.x versions of HL7 that specifies “pipes, hats, and tildes” as separators. Most other countries use the newer XML based dialect. We have used and tested both and I have found that there are several reasons why the earlier version is preferable:

1. Generally, the two most limited IT resources in hospitals are storage and network. XML taxes both. Specifically, the size differential between an HL7 version 2.x message with single character separators compared with an HL7 version 3.x message with XML attribute names - is substantial increasing the load on both the network and the storage subsystem.

2. Although it is easy to find and use an XML parser, it takes much longer to parse xml data than to parse a simpler delimited format such as HL7 version 2.x.

However, as IT data consumers we are rarely given an option and so we must use what they send us.

It must be pointed out that HL7 is a guideline and not a hard-and-fast specification. Therefore, we may have a patient’s MRN located on one element in the source message and the MRN is expected to be located in another element in the receiving system. In order to fix this impedance mismatch, an interface layer is often inserted in the conversation to transform HL7 from a form sent by the source into a form expected by the receiver.

One of my favorite features of HL7 is a definition of types and sub-types.  In the object-oriented programming world these types would likely be called entities. For example, we might have a person entity with all the attributes commonly associated with a person. In our HL7 message, we may have a guarantor segment that would have a person entity as a sub-type that is part of its definition.

HL7 has been designed to be an efficient machine-to-machine communication language and is not easy for humans to read. Even with XML, which is supposed to be “self-describing”, it remains a challenge to look at an XML HL7 message to find a particular datum element. A very useful (indispensible?) tool to read HL7 messages is Interface Explorer that comes packed with features essential for any organization using HL7.

However, Interface Explorer only helps us with one or a few messages. Our requirements in working with Microsoft Amalga were to validate the Amalga parsers and so we needed a way to extract and store messages across a large number of HL7 messages. In my next post, I’ll introduce a tool we use to break apart large number of HL7 messages and store them in an efficient database format so we can easily find individual messages to help us validate the Amalga parsers.

SQL Server Full-Text-Search for Clinical Data

In a recent post, I talked about how SQL Server’s Full-Text-Search might be a rather simple solution to find health care data in progress notes and other “blobs” of semi-structured data. I decided to build a working prototype and this blog will discuss how it was constructed and what our next steps will be.

Fortunately, we have a transcription department so all of our notes are saved as ASCII text. However, for my personal health information at home, I’ve scanned the documents I received from my health providers and saved the images as TIFF format. SQL will use the Windows’ built-in TIFF OCR engine to create an index on the words included in these TIFF (or PDF) documents.

The first small challenge was that our notes are saved in RTF. I wanted to be able to display the notes on a web page and without requiring an ActiveX plug in to display the RTF. So how to convert the RTF to plain text?

I found the easiest (and free) way to do this conversion was to use the Rich-Text-Box that is included with .NET and Visual Studio. I feed the text as rtf into the control and then immediately read it out as text and the control handles the conversion.

RichTextBox rtb = new RichTextBox();
rtb.Rtf = my RTF from the feed;
myDatabase = rtp.Text;

Our content still contained some HL7 markup that I needed to “fix” and I needed to change the carriage return and line feeds to their HTML equivalents, but this was all easy to do so that the final results looked very similar to the original text with the rtf.

I next built a single .html web page using jquery for the user interface that will collect the search terms and allow the searcher to limit the results by either Visit Id or MRN.

clip_image002

When they press the Search button, the application will make a web service call to the SQL Server and return a list of MRNs (hidden below), with the size of the Note, and the message time.  A user can hover over the link in the “View Note” column to see the first line of the note; or, press the link to get a dialog pop-up where the search term is highlighted.

image

So a single html page wired up to a SQL Server with Full Text Search enabled allows our clinicians to search now across over 26 million records.  And with the incredible performance of the Full-Text-Index, the results come back almost instantly.

SharePoint 2010 for Health Data

As I mentioned in my last post, Search has become the de facto way to quickly locate an item buried in the network sea – either on the Internet or on corporate networks. Given the complexity of Health Care data and terms along with the various taxonomies spread across disease and treatment types, an effective search across medical data can be very difficult to achieve. Full-Text-Search support in database technologies is one small step to help facilitate search, but in healthcare, we need more.

Microsoft’s SharePoint 2010 has features that make it a powerful platform to develop solutions to help clinicians organize, store, and easily locate health data. A rather technical overview from Infosys will help the reader dig deeper.

For this post, let me highlight the platform features I think are most compelling for health data.

1. The audit regulations in the US Healthcare laws are often difficult to achieve; that is, “who saw what and when?” SharePoint includes powerful auditing and reporting capabilities out of the box that are an excellent adjunct to the rather simple user administration tools also included. As a full time developer of clinical data systems, this one feature would save me an enormous amount of time.

2. SharePoint includes what they call “Deep Refiners” that include counts. In other words, it is relatively easy to apply a filter (what they call “refiner”) to a search set and to quickly see how that would impact the search results. Try the Financial Times web site to get a sense for how this works.

3. When I installed the “free” version of SharePoint 2010, the Foundation version, I saw that it installed the Microsoft Speech Engine and APIs. I wondered about this and found that this was included so that SharePoint Search would include phonetic results for people searches. A search for Geoff should return Jeff as well. Cool.

4. Click-Through-Relevancy is something that I believe Google pioneered that is included in SharePoint Search. Each time a search result is clicked, its ranking increases so that in future similar searches, it will be correctly ranked and position in the displayed results.

5. Search can be “contextual” so that different results and refinements can be based on the users’ roles or profiles. This is powerful in healthcare where we might tune contextual search parameters for given specialty such as endocrinology.

6. Search is not limited to data stored in the SharePoint database. It can also search in the file system and can open TIFF or PDF files, run Optical Character Recognition (“OCR”) on the image(s) and then index those images so that their textual equivalents can be included in the search results.

7. And my favorite feature is the metadata and taxonomy searches. These can be configured at the Enterprise level for all users; at the group level for a department or practice, and at the user level so a clinician can “tag” a document so it is very fast to locate later. Plus, these tags show up in the Search results. For example, if you create a tag with “swine flu” and attach it to images of discharge summaries. Later you can use the tag to narrow your search; or, if you search using some other metric, “swine flu” will show up as a refinement variable so you can filter or refine the results with just that term.

I look forward to our roll-out of SharePoint at our hospital and to explore more deeply how we might develop the platform to allow our Clinicians to more quickly find what they need to deliver better care.