Healthcare payers employ large numbers of clinicians to manually review the unstructured data (such as clinician notes, discharge summaries, and progress notes) in electronic health records (EHR) to create a more holistic view of a patient's health. This process is resource intensive and infrequent, thus implying a delay in disease diagnosis and care. Healthcare payers review EHRs for clues that can indicate future health changes. They rely on accurate clinical coding to assign appropriate risk categories to each patient. The difference between incorrect and correct hierarchical condition codes during risk adjustment can cost millions of dollars for the health payer.†
Payers are beginning to use natural language processing (NLP) to understand nuanced language within a body of text in near real-time to improve risk adjustment, reduce costs, and enhance patient care. As an example, nuanced language may differentiate between a patient who has stopped smoking and a patient who is trying to quit smoking. NLP has been used as a tool by healthcare providers for some time, but mandates around interoperability are bringing NLP to the forefront.†
The Disease Prediction reference kit benefits healthcare payers by using NLP to uncover insights hidden in the unstructured data in patient health records. These insights may help with early disease progression, identify gaps in a patient’s care, and improve the risk adjustment process.
What Is Included
In collaboration with Accenture*, Intel developed an AI reference kit to predict disease probabilities from symptoms (unstructured data). Each reference kit includes:
- Training data
- An open source, trained ClinicalBERT model
- User guide
- oneAPI components
At a Glance
- Industry: Healthcare providers
- Task: Multiclass classification to predict the prognosis probabilities from the patient symptom description
- 4,962 paragraphs describing patient state and a final prognosis in .csv format
- Synthetically generated sentences using data from the linked source generator; the data generator is available for customization
- 80:20 split (training:inference)
- Type of Learning: Supervised
- Models: ClinicalBERT with classification
- Output: Probability of a specific diagnosis from the set of 42 different diseases, including items like fungal infection, impetigo, and vertigo.
- Intel® AI Portfolio:
- Intel® Optimization for PyTorch*
- Intel® Neural Compressor
Optimized with Intel oneAPI for Better Performance
Performance was tested on Microsoft Azure* Standard_D4_V5 using 3rd generation Intel® Xeon® processors to optimize the kit.
To build disease prediction models at scale, data scientists need to train the models using substantial datasets and run inference frequently. The ability to accelerate training allows them to train more frequently and work towards achieving better model accuracy. Faster speed in inference allows them to run prediction in real-time scenarios.
With Intel® oneAPI toolkits, little to no code change is required to attain the performance boost.
Data scientists often run multiple models in parallel (using the same compute resources) to determine other patient risk factors beyond just the disease prediction. Being able to significantly compress the models while maintaining model accuracy on CPUs can benefit the total cost of ownership of these healthcare solutions.
For healthcare payers, being able to access and use unstructured data to predict diseases using NLP can help monitor disease progression. Healthcare providers can proactively manage patient care for at-risk groups for better patient outcomes. Taking advantage of NLP-based models for disease prediction has the potential to provide cost savings to healthcare insurers because medical treatments in later stages can be considerably more complex and expensive than treatments administered earlier.
† Yacoubian, C. (2022). "How Payers Are Using AI to Address Big Data Challenges." https://www.hmpgloballearningnetwork.com/site/frmc/commentary/how-payers-are-using-ai-address-big-data-challenges