Disease Prediction Using NLP

Disease Prediction Using Natural Language Processing

Business Results

Up to 69% faster inference in real time

Up to 68% faster inference in batch

For workloads and configurations, see GitHub*. Results may vary.

Background

Healthcare payers employ large numbers of clinicians to manually review the unstructured data (such as clinician notes, discharge summaries, and progress notes) in electronic health records (EHR) to create a more holistic view of a patient's health. This process is resource intensive and infrequent, thus implying a delay in disease diagnosis and care. Healthcare payers review EHRs for clues that can indicate future health changes. They rely on accurate clinical coding to assign appropriate risk categories to each patient. The difference between incorrect and correct hierarchical condition codes during risk adjustment can cost millions of dollars for the health payer.^†

Payers are beginning to use natural language processing (NLP) to understand nuanced language within a body of text in near real-time to improve risk adjustment, reduce costs, and enhance patient care. As an example, nuanced language may differentiate between a patient who has stopped smoking and a patient who is trying to quit smoking. NLP has been used as a tool by healthcare providers for some time, but mandates around interoperability are bringing NLP to the forefront.^†

The Disease Prediction reference kit benefits healthcare payers by using NLP to uncover insights hidden in the unstructured data in patient health records. These insights may help with early disease progression, identify gaps in a patient’s care, and improve the risk adjustment process.

What Is Included

In collaboration with Accenture*, Intel developed an AI reference kit to predict disease probabilities from symptoms (unstructured data). Each reference kit includes:

Training data
An open source, trained ClinicalBERT model
Libraries
User guide
oneAPI components

At a Glance

Industry: Healthcare providers
Task: Multiclass classification to predict the prognosis probabilities from the patient symptom description
Dataset:
- 4,962 paragraphs describing patient state and a final prognosis in .csv format
- Synthetically generated sentences using data from the linked source generator; the data generator is available for customization
- 80:20 split (training:inference)
Type of Learning: Supervised
Models: ClinicalBERT with classification
Output: Probability of a specific diagnosis from the set of 42 different diseases, including items like fungal infection, impetigo, and vertigo.
Intel® AI Portfolio:
- Intel® Optimization for PyTorch*
- Intel® Neural Compressor

Testing Environment

Optimized with Intel oneAPI for Better Performance

Performance was tested on Microsoft Azure* Standard_D4_V5 using 3rd generation Intel® Xeon® processors to optimize the kit.

Benefits

To build disease prediction models at scale, data scientists need to train the models using substantial datasets and run inference frequently. The ability to accelerate training allows them to train more frequently and work towards achieving better model accuracy. Faster speed in inference allows them to run prediction in real-time scenarios.

With Intel® oneAPI toolkits, little to no code change is required to attain the performance boost.

Data scientists often run multiple models in parallel (using the same compute resources) to determine other patient risk factors beyond just the disease prediction. Being able to significantly compress the models while maintaining model accuracy on CPUs can benefit the total cost of ownership of these healthcare solutions.

For healthcare payers, being able to access and use unstructured data to predict diseases using NLP can help monitor disease progression. Healthcare providers can proactively manage patient care for at-risk groups for better patient outcomes. Taking advantage of NLP-based models for disease prediction has the potential to provide cost savings to healthcare insurers because medical treatments in later stages can be considerably more complex and expensive than treatments administered earlier.

Download Kit

Related AI Reference Kits

References

† Yacoubian, C. (2022). "How Payers Are Using AI to Address Big Data Challenges." https://www.hmpgloballearningnetwork.com/site/frmc/commentary/how-payers-are-using-ai-address-big-data-challenges

Stay Up to Date on AI Workload Optimizations

Sign up to receive hand-curated technical articles, tutorials, developer tools, training opportunities, and more to help you accelerate and optimize your end-to-end AI and data science workflows.

Take a chance and subscribe. You can change your mind at any time.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Disease Prediction Using Natural Language Processing

Business Results

Background

What Is Included

At a Glance

Testing Environment

Benefits

Related AI Reference Kits

References

Stay Up to Date on AI Workload Optimizations

You’re In!

Failed to submit your form.

Form Submission Failed

Product and Performance Information

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Disease Prediction Using Natural Language Processing

Business Results

Background

What Is Included

At a Glance

Testing Environment

Benefits

Related AI Reference Kits

References

Stay Up to Date on AI Workload Optimizations

You’re In!

Failed to submit your form.

Form Submission Failed

Your registration cannot proceed. The materials on this site are subject to U.S. and other applicable export control laws and are not accessible from all locations.

Product and Performance Information