The Art of Medicine and Mining Semi-Structured Data

For this post, I want to introduce one of the challenges we face in organizing and presenting non-structured or semi-structured data in medicine.  The best examples of this are the progress notes or other, typically dictated, free text narratives that are so pervasive in medicine.

I began thinking about this problem four years ago when I had the opportunity to attend my daughter and son-in-law’s Medical School commencement ceremony where the doctor who delivered the commencement speech said something along the lines of:  “You have spent the past four years learning the Science part of medicine and now you must devote yourselves to learning and practicing the Art of medicine”

I suppose that’s why doctors “practice” medicine.  The point here is that health data analytic systems must provide a way to allow providers to data-mine the gold nuggets of information buried in these notes.

Fortunately, I see a convergence between this problem and the maturity of technical solutions to help solve it.  In future posts, I’ll share a few specific technical solutions; but at this point, let me summarize what I think are the requirements:

  1. Data consumers must be able to quickly find the topics they want buried in the narrative.  The key functional word is:  “quickly”.  In most other fields, “time is money” but in health care, “time is life” and can be the difference between life and death.
  2. The narrative must be associated with other medical data.  This could be positional; that is, displayed with the other clinical data.  Or, it could be associated with custom meta-data tags.
  3. It would be nice if the topic could be highlighted in the results.  That is, if we are looking for the term “prognoses”, the program would highlight that term in the narrative.
  4. It would also be nice if the topic could automatically include synonymous topics.  For example, I might want to find “shortness of breath” which might also include “SOB” which in medicine has a different meaning than in most other fields 🙂
What technologies might help us meet these requirements?

Clearly “Search” is an obvious technical solution and one that Google has helped revolutionize.  So in addition to simple word search, it would be nice to search or limit search results by meta-data tags; or patient populations.  In addition, it may be useful to federate the search across other media types.  A personal anecdote might help illustrate.

A couple of years ago, I was diagnosed with a very rare but extremely serious acute medical condition.  The attack only lasted a few hours, but it left some residual medical issues for which I was referred to what the referring doc called “one of the smartest docs around”.  He did his history and exam thing and then called me over to his computer where he had his Google search up and he navigated to the article he was interested in and wanted to share with me.  In this case, the doc took my specific health information – using the name of my rare illness – and used his Google Search to aid him in his treatment plan.  This medical expert was qualified to sift through the good and bad information in the Internet Sea to find the specific information he need to understand and treat my acute illness.  Surely we in the healthcare IT arena can make this easier for him and others like him.

Other technologies we might use: speech to text, phonetic searches, synonymous searches, optical character recognition – taking an image of words and converting them to the ascii of the words so they can be indexed and searched.

So, in the following posts, I hope to drill down into some specific technologies and illustrate how they may help us make this process of mining semi-structured data more efficient to reduce the time it takes for health providers to get the information they need, as well as to help them get more targeted results.

Leave a Reply

Your email address will not be published. Required fields are marked *

one × two =