Much clinical information, such as age and gender, is available in structured formats. But often, the information that can tell doctors the most about a patient's condition is in unstructured formats. This includes patients’ free-text clinical notes (e.g., nursing notes, lab reports, and radiology reports), which are generally difficult and time-consuming to analyze. A critical part of providing personalized medicine is being able to efficiently use this unstructured data to provide clinical insights.
Intel® Distribution for Apache Hadoop* software, GraphBuilder*, and GraphLab* are distributed computation frameworks that make it possible to analyze free-text clinical notes in a scalable, efficient way. Both Intel Distribution for Apache Hadoop software and GraphBuilder rely on the independence among patients’ records to provide a data-parallel solution for preprocessing, formatting, and normalization. GraphLab uses the dependencies in post-processed records to derive meaningful insights in a parallel way. Together, these technologies can help alleviate the bottleneck of creating topic models documented in the research article "Topic Models for Mortality Modeling in Intensive Care Units.”
Read the full Distributed Systems for Clinical Data Analysis White Paper.