HL7 Entity-Attribute-Value Database for Data Validation

In my day job, I work with Microsoft Amalga which, for our implementation, consumes (parses) a number of HL7 feeds.  As with most HL7 consumers, Amalga parses the text elements it expects out of the the HL7 feed message for storage in the Amalga SQL Server tables.  However, there are application and implementation specific business rules that are applied and require us to validate that the parsers have acted correctly.  In this post, I’m going to discuss a tool I authored to help us validate the Amalga parsers.  Our tool is fairly simple and has no business rules associated with it.  It simply reads the HL7 message and parses out the various segments whose values are then placed in a “flat” Entity-Attribute-Value (“EAV”) database.

Let’s start with the database.

As it turns out, the database has proved to be useful for both validation and for ad-hoc reporting.  I enabled SQL Server’s Full-Text-Search engine on the Value column of data which holds every data element across all of our feeds.  Even with several billion rows of data, our searches return in under a few seconds.  Therefore, this eliminates the need to try and figure out where in the Amalga tables a particular data item might be stored. For example, we wanted to get an idea of the number of patients who had come in with symptoms of the H1N1 flu before the diagnosis code was widely used. Using a handful of symptoms, we ran a query in the EAV and in a matter of seconds, we received a count of unique MRNs spread across the dates of visit which could easily be displayed in a chart.

As another example, Amalga recently hit an error and stopped parsing messages when it encountered a new event type that was unanticipated. A quick search of the EAV found the message which was only the second message out of many millions of that type. The first had processed without error (which was an error in itself) but the second had stopped the system. Using the EAV, we found both messages and after working with our Interface team, we identified the problem and quickly corrected it and resumed parsing.

OK, enough background, let’s talk about the design of the database.

There are some great articles (many by IBM – so filter your Google with IBM hosted info) about how EAV works. The thing that confused me most when I started working with EAV was thinking of it as Name/Value or a one-to-one match type architecture. This is far from correct. Let me illustrate with our design.

For our database, our Entity is the HL7 message such as an ADT message. So the Entity is the unique message ID we assign when we first process the message; that is, the Primary Key in the Message Table which is an identity field.

The Attributes of our Message are also held in the Message table along with the key. For example, we store: message Type (ADT), message Event (A05), MRN (derived after the message parse), Encounter Id (derived after the message parse), and for our Amalga validation, the Amalga Message Id store in the Amalga database.

The Value(s) are stored in the EAV table with a Value field, which has specific HL7 attributes associated with the value and is linked to the Message with its attributes via the Message Id.

To illustrate with a single value, let’s say we receive an ADT message for a patient who recently arrived at our urgent care facility in Alaska (all of this is contrived and is created for illustration). The “point-of-care” value in our HL7 message might be “UCAK” and so we would have one row in our EAV with the “Value” segment holding “UCAK”. Therefore, “UCAK” is the value for an HL7 ADT Entity (message) with a number of attributes including: Patient Identifier, Visit Identifier, Message Type, etc.

Now that you understand the logic of our database, let’s talk about the design.

The database is normalized so that we use lookup tables. For example, we have an Event Type table with an HL7 type of A14 with the corresponding value of “pending admit”.

lkupAttribute Table

The key lookup table is the lkupAttribute table holding the HL7 segments (which in a very early version of this was the attribute for our EAV). Let’s take one row to describe what this table does:

  • AttributeId – is a database assigned unique identifier for this row. Note that this can never change since it is heavily used in the other tables. Our example, this value is 12.
  • HL7 – the HL7 segment type, for our example this is PID.5 In our parser code, we expect to have this text in this format so do not change it. For those astute in HL7, the Set and Field Ids are handled later so for this implementation, it contains the segment and position only.
  • AttributeName – in our case “Patient Name”. This is the typical “friendly name” for the value and the text we will use later when we pull out the values for display. Note that this is the HL7 suggested descriptor and the implementation may or may not abide by the suggestion.
  • AttributeTypeId – Each HL7 segment has a type. In our example, the type is “Name” and this type can have its own list of segments such as “first name” and “last name”. For our purposes, this type id is customized for our application and this Id value is a foreign key for the AttributeType table. For our current example, this value is of type 12.

AttributeType table

The AttributeType table is the lookup table used by the application to parse out the EAV values. Continuing our example:

  • Id – is the database assigned value for the type. Note that the values in this table should never change either since they are used in the code to determine how to parse. Generally, the first 10 items are native types and the rest are custom types. For example, id #1 is a “string” type; whereas id #12 is an “xpn” type which is a name type.
  • HL7Type – the type defined by the HL7 spec, such as “xpn”
  • Description – this is not used by the application but is copied from the spec to help in programming the parsers. For our example, it looks like this:

<Family Name (FN)> ^ <Given Name (ST)> ^ <Second and Further Given Names or Initials Thereof (ST)> ^ <Suffix (e.g., JR or III) (ST)> ^ <Prefix (e.g., DR) (ST)> ^ <Degree (e.g., MD) (IS)> ^ <Name Type Code (ID)> ^ <Name Representation Code (ID)> ^ <Name Context (CE)> ^ <Name Validity Range (DR)> ^ <Name Assembly Order (ID)> ^ <Effective Date (TS)> ^ <Expiration Date (TS)> ^ <Professional Suffix (ST)>

Message Table

The Message table has the following columns:

  • MessageId – This is the database assigned unique row identifier.
  • RawMessageId – Used by Amalga
  • MRN – This is the MRN for the patient for which this message is reporting. This value is calculated by the parser code and the value is pulled from the appropriate place in the various message types. For example, this may be in one location in the ADT messages but another in the Lab Results feed.
  • VisitID – or Encounter Id, or sometimes called the Account, is the identifier that the hospital systems assign for a patient visit or encounter.
  • MessageDateTime – when the message was sent or received
  • EventType – such as A31 or A05. This makes it easy to find and sort the HL7 messages by message type.
  • HL7FeedType– the name of the HL7 feed such as: ADT, or Lab Orders.
  • MessageControlId – this optional value is sent by the interface gateway such as Egate or SeeBeyond.


And the last table is the EAV table with the following columns:

  • EAVId – the row identifier, assigned by the database
  • MessageId – the foreign key from the Message table, allows us to join rows in the EAV table to the Message table.
  • PrimaryAttributeId – the foreign key from the lkupAttribute table. This tells us what HL7 segment the value belongs to.
  • SubAttributeId – the foreign key from the lkupAttribute table for the sub-attributes. For example, the PrimaryAttributeId might be the number for “Patient Name”, and the SubAttributeId value might be “Last Name”. This allows us to customize how we want to search or re-assemble values later and makes our “where” clauses extremely accurate.
  • FieldId – some HL7 segments are lists of entries and can have an almost unlimited number of entries. For example, a Patient Identifier might have several entries, one from each hospital system. This will allow us to store the data in a very granular fashion; plus providing future flexibility as we add new systems.
  • SetId – some HL7 segments have sets of data. For example, the Insurance segment may have a different set for each policy the patient has. This field identifies the SetId from the HL7 message.

Wow, this was much longer than I expected. Next time I’ll talk about the code and how it uses this database and post up the code and db scripts.

Leave a Reply

Your email address will not be published. Required fields are marked *

6 − 4 =