Organizing messy data, a Google developer's view

By Laura Lovett
Share

As the big data revolution continues to rock the healthcare space, there are still questions about how to get a handle on that massive amount of data and make it valuable. It’s a question that is plaguing every group working in the healthcare space, including the developers. 

At HL7 FHIR DevDays in Boston, Project Manager at Google Eyal Oren discussed some of the struggles and possible solutions of making that data work for developers and in a FHIR system. 

“There is a shared vision emerging and many of you have had that vision for a long time; health data is available, it will be in FHIR, and will come from a lot of different sources and patients will own their own data,” Oren said at the conference. 

While there may be a vision for making that data as useable as possible, there are some hurdles that the industry faces. A major topic of interest right now is clean data versus available data. 

“This brings to bare something interesting. There is a paradigm that the data needs to be clean. …[W]hen I order [something] from a lab, I need to know which lab it came from, the same with a drug, I need to know which drug it is,” Oren said. “But there are a lot of cases where that just isn’t going to happen. …[T]he data is not clear and you will need to do your best with whatever you’ve got. And oftentimes that means you will drop it all on the floor because you don’t know what to do with it. The interesting thing is I think as this ecosystem is opening up and as we are getting more [data] you will see both of these cases.”

Healthcare data is messy and complicated, said Oren, and nothing is going to change that. But he said we can look at the web for examples of how to make sense of the data. 

“In Translate, for example, if you need to translate English to Chinese we don't go around and ask you to write your English in such a way that is easier for us to translate into Chinese,” Oren said. “We just take whatever is there and try to make our best out of it.”

With so many data points out there for every patient, it can be a struggle for clinicians to capture a full view of any one patient. Mostly tracked through EHR systems that are often interoperable, connecting the dots in patient data can be difficult. In fact health history is often left up to the patient to describe. There has been a focus on patients getting more control of their EHRs, making it easier to transport between providers. For example, Apple Health records, which uses HL7’s FHIR, will aggregate existing patient-generated data in a user's Health app with data from their EHR — if the user is a patient at a participating hospital. But in terms of the clinician being able to sift through data, its tough to find a place to start. 

In the future, a good starting place could be to give a single sentence overview, said Oren. But even that — finding the important bits amidst all the other health information — poses a concern. 

One of the issues: misspellings. There are a lot of ways that doctors and clinicians can misspell words making it difficult to search. This issue, on top of the fact that synonyms are plentiful in the healthcare world, can make it tough to search for a specific conditions. 

But it’s possible to learn from these struggles, Oren said. He said that it will take a multifaceted approach to finding specific conditions or medications. Solutions include things as simple as applying spell check or setting up a synonym checker. 

However, as messy as the data is, it can give clinicians insights that they wouldn’t have been privy to before, according Oren. 

“Some of the data needs to be carefully structured and controlled, and some of it is what can we squeeze out of it,” Oren said.