When I started working with hospital data, it was clear that HL7 solved a number of data conveyance issues by specifying a lingua franca for health data. In this post, I will provide a brief introduction to how HL7 is used to convey these data, discuss the two primary dialects, and introduce some of the challenges we have had in working with HL7 data. In a future post I plan to introduce a useful tool I wrote (and will release as freeware) that allows us to parse Hl7 data and store it in an Entity-Attribute-Value database for data validation
Our friends over at Wikipedia, have a pretty good introduction to HL7. A frequent use of HL7 messages is to communicate information from one Hospital Information System to another. For example, the Admission-Discharge-Transfer (“ADT”) system might be in EPIC and the Clinical Information System may be in Cerner. When a patient is admitted, the Cerner system would need to know about it and so Epic would have an outbound HL7 interface that would send HL7 messages to the inbound Cerner system.
Generally, US hospitals use the earlier 2.x versions of HL7 that specifies “pipes, hats, and tildes” as separators. Most other countries use the newer XML based dialect. We have used and tested both and I have found that there are several reasons why the earlier version is preferable:
- Generally, the two most limited IT resources in hospitals are storage and network. XML taxes both. Specifically, the size differential between an HL7 version 2.x message with single character separators compared with an HL7 version 3.x message with XML attribute names – is substantial increasing the load on both the network and the storage subsystem.
- Although it is easy to find and use an XML parser, it takes much longer to parse xml data than to parse a simpler delimited format such as HL7 version 2.x.
However, as IT data consumers we are rarely given an option and so we must use what they send us.
It must be pointed out that HL7 is a guideline and not a hard-and-fast specification. Therefore, we may have a patient’s MRN located on one element in the source message and the MRN is expected to be located in another element in the receiving system. In order to fix this impedance mismatch, an interface layer is often inserted in the conversation to transform HL7 from a form sent by the source into a form expected by the receiver.
One of my favorite features of HL7 is a definition of types and sub-types. In the object-oriented programming world these types would likely be called entities. For example, we might have a person entity with all the attributes commonly associated with a person. In our HL7 message, we may have a guarantor segment that would have a person entity as a sub-type that is part of its definition.
HL7 has been designed to be an efficient machine-to-machine communication language and is not easy for humans to read. Even with XML, which is supposed to be “self-describing”, it remains a challenge to look at an XML HL7 message to find a particular datum element. A very useful (indispensible?) tool to read HL7 messages is Interface Explorer that comes packed with features essential for any organization using HL7.
However, Interface Explorer only helps us with one or a few messages. Our requirements in working with Microsoft Amalga were to validate the Amalga parsers and so we needed a way to extract and store messages across a large number of HL7 messages. In my next post, I’ll introduce a tool we use to break apart large number of HL7 messages and store them in an efficient database format so we can easily find individual messages to help us validate the Amalga parsers.