Artificial Intelligence and Healthcare Data

Published: 03/05/2018  

Last Updated: 03/05/2018


Health professionals and researchers have access to plenty of healthcare data. However, the implementation of artificial intelligence (AI) technology in healthcare is very limited, primarily due to lack of awareness about AI. AI is still a problem for most healthcare professionals. The purpose of this article is to introduce AI to the healthcare professional, and its application to different types of healthcare data.

IT (information technology) professionals such as data scientists, AI developers, and data engineers are also facing challenges in the healthcare domain; for example, finding the right problem,1 lack of data availability for training of AI models, and various issues with the validation of AI models. This article highlights the various potential areas of healthcare where IT professionals can collaborate with healthcare experts to build teams of doctors, scientists, and developers, and translate ideas into healthcare products and services.

Intel provides educational software and hardware support to health professionals, data scientists, and AI developers. Based on the dataset type, we highlighted a few use cases in the healthcare domain wheref AI was applied using various medical datasets.

Artificial Intelligence

AI is an intelligent technique that enables computers to mimic human behavior. AI in healthcare uses algorithms and software analyzing of complex medical data to find the relationships between patient outcomes and prevention/treatment techniques.2 Machine learning (ML) is a subset of AI. It uses various statistical methods and algorithms, and enables a machine to improve with experience. Deep learning (DL) is subset of ML.3 It takes machine learning to the next level with multilayer neural network architecture. It indentifes a pattern or does other complex tasks like the human brain does. DL has been applied in many fields such as computer vision, speech recognition, natural language processing (NLP), object detection, and audio recognition.4 Deep neural networks (DNNs) and recurrent neural networks (RNNs), examples of deep learning architectures, are utilized in improving drug discovery and disease diagnosis.5

Relationship of AI, machine learning, and deep learning.

Figure 1. Relationship of artificial intelligence, machine learning, and deep learning

AI Health Market

According to Frost & Sullivan (a growth partnership company), the AI market in healthcare may reach USD 6.6 billion by 2021, a 40 percent growth rate. AI has the potential to reduce the cost of treatment by up to 50 percent.6 AI applications in healthcare may generate USD 150 billion in annual savings by 2026, according to the Accenture analysis. AI-based smart workforce, culture, and solutions are consistently evolving to provide comfort to the healthcare industry in multiple ways, such as 7

  • Alleviating the burden on clinicians and giving medical professionals the tools to do their jobs more effectively.
  • Filling in gaps during the rising labor shortage in healthcare.
  • Enhancing efficiency, quality, and outcomes for patients.
  • Magnifying the reach of care by integrating health data across platforms.
  • Delivering benefits of greater efficiency, transparency, and interoperability.
  • Maintaining information security.

Healthcare Data

Hospitals, clinics, and medical and research institutes generate a large volume of data on a daily basis, which includes lab reports, imaging data, pathology reports, diagnostic reports, and drug information. Such data is expected to increase greatly in the next few years when people expand their use of smartphones, tablets, the IoT (Internet of things), and Fitness Gazette to generate information.8 Digital data is expected to reach 44 zettabytes by 2020, doubling every year.9 The rapid expansion of healthcare data is one of the greatest challenges for clinicians and physicians. Current literature suggests that big data ecosystem and AI are solutions to processing this massive data explosion along with meeting the social, financial, and technological demands of healthcare. Analysis of such big and complicated data is often difficult and it requires a high level of skill for data analysis. Moreover, the most challenging part is an interpretation of results and recommendations based on the outcome, and medical experience, and requires many years of medical involvement, knowledge, and specialized skill sets.

In healthcare the data are generated, collected, and stored in multiple formats including numerical, text, images, scans, and audios or videos. If we want to apply AI to our dataset, we first need to understand the nature of the data, and all questions that we want to answer from the target dataset. Data type helps us to formulate the neural network, algorithm, and architecture for AI modeling. Here, we introduce a few AI-based cases as examples to demonstrate the application of AI in healthcare, in general. Typically, it can be customized accordingly, based on the project and area of interest (that is, oncology, cardiology, pharmacology, internet medicine, primary care, urgent care, emergency, and radiology). Below is a list of AI applications based on the format of various datasets that are gaining momentum in the real world.

Healthcare Dataset: Pictures, Scans, Drawings

One of the most popular ways to generate data in healthcare is with images such as scan (PET Scan image with credit Susan Landau and William Jagust at UC Berkeley)10, tissue section11, drawing12, organ image13 (Figure 2A). In this scenario, specialists look for particular features in an image. A pathologist collects such images under the microscope from tissue sections (fat, muscle, bone, brain, liver biopsy, and so on). Recently, Kaggle organized the Intel and MobileODT Cervical Cancer Screening Competition to improve the precision and accuracy of cervical cancer screening using a big image data set (training, testing, and additional data set).14 The participants used different deep learning models such as the faster region-based convolution neural network (R-CNN) detection framework with VGG16,15 supervised semantics-preserving deep hashing (SSDH) (Figure 2B), and U-Net for convolutional networks.16 Dr Silva achieved 81 percent accuracy using Caffe* on GoogLeNet* for the validation test.16

Similarly, Xu et al. investigated datasets of over 7,000 images of single red blood cells (RBCs) from eight patients with sickle cell disease. They selected the DNN classifier to classify the different RBC types.17 Gulshan et al. applied deep convolutional neural network (DCNN) in more than 10,000 retinal images collected from 874 patients to detect moderate and worse referable with about 90 percent sensitivity and specificity.18

Various types of healthcare image data

Figure 2. A) Various types of healthcare image data. B) Supervised semantics-preserving deep hashing (SSDH), a deep learning model, used in the Intel and MobileODT Cervical Cancer Screening Competition for image classification. Source: 10-13,16.

Positron Emission Tomography (PET), computed tomography (CT), magnetic resonance imaging (MRI), and ultrasound images (Figure 2A) are another source of healthcare data where images of tissue inside are collected from internal organ (like brain, tumors) without invasion. Deep learning models can be used to measure the tumor growth over time in cancer patients on medication. Jaeger et al. applied convolutional neural network (CNN) architecture on a diffusion-weighted MRI. Based on an estimation of the properties of the tumor tissue, this architecture reduced false-positive findings, and thereby decreased the number of unnecessary invasive biopsies. The researchers noticed that deep learning reduced the motion and vision error, and thus provided more stable results in comparison to manual segmentation.19 A study conducted in China showed that deep learning helped to achieve 93 percent accuracy in distinguishing malignant and benign cancer on the elastogram of ultrasound shear-wave elastography of 200 patients.20,21

Healthcare Dataset: Numerical

Example of numerical data

Figure 3. Example of numerical data

Healthcare industries collect a lot of patient/research-related information such as age, height, weight, blood profile, lipid profile, sugar, blood pressure, and heart rate. Similarly, gene expression data (for example, fold change) and metabolic information (for example, level of metabolites) are also expressed by the numbers.

The literature showed several cases where the neural network was successfully applied in healthcare. For instance, Danaee and Ghaeini from Oregon State University (2017) used a deep architecture, stacked denoising autoencoder (SDAE) model, for the extraction of meaningful features from gene expression data of 1097 breast cancer and 113 healthy samples. This model enables the classification of breast cancer cells and identification of genes useful for cancer prediction (as biomarkers) or as the potential for therapeutic targets.22 Kaggle shared the breast cancer dataset from the University of Wisconsin containing formation radius, texture, perimeter, area, smoothness, compactness, concavity, symmetry, and fractal dimension of the cancer cell nucleus. In the Kaggle competition, the participants had successfully built a DNN classifier to predict breast cancer type (malignant or benign). 23

Healthcare Dataset: Textual

Example of textual data

Figure 4. Example of textual data

Plenty of medical information is recorded as text; for instance, clinical data (cough, vomiting, drowsiness, and diagnosis), social, economic, and behavioral data (such as poor, rich, depressed, happy), social media reviews (Twitter*, Facebook*, Telegram*, and so on), and drug history. NLP, a type of neural network, translates free text into standardized data. It enhances the completeness and accuracy of electronic health records (EHRs). NLP algorithms extract risk factors from notes available on the EHR.
For example, NLP was applied on 21 million medical records. It identified 8500 patients who were at risk of developing congestive heart failure with 85 percent accuracy.24 The Department of Veterans Affairs used NLP techniques to review more than two billion EHR documents for indications of post-traumatic stress disorder (PTSD), depression, and potential self-harm in veteran patients.25 Similarly, NLP was used to identify psychosis with 100 percent accuracy on schizophrenic patients based on speech patterns.26 IBM Watson* analyzed 140,000 academic articles, which cannot be read, understood, or remembered by humans, and suggested recommendations about a course of therapy for cancer patients.24


Figure 5. Examples of electrogram data. Source:27,31

Healthcare Dataset: Electrogram

Architecture of deep learning with convolutional neural network model

Figure 6. Architecture of deep learning with convolutional neural network model useful in classification of EEG data. (Source: 28-29)

Electrocardiogram (ECG)27, electroencephalogram (EEG), electrooculogram (EOG), electromyogram (EMG), and sleep test are some examples of graphical healthcare data. Electrogram is the process of recording the electrical activity of the target organ (such as heart, brain, and muscle) over a period of time using electrodes placed on the skin. 

Schirrmeister et al. from the University of Freiburg designed and trained a deep ConvNets (deep learning with convolutional network) model to decode raw EEG data, which is useful for EEG-based brain mapping.28,29 Paurbabaee et al. from Concordia University, Canada used a large volume of raw ECG time-series data and built a DCNN model. Interestingly, this model learned key features of the paroxysmal atrial fibrillation (PAF)—a life-threatening heart disease, and was thereby useful in PAF patient screening. This method can be a good alternative to traditional ad hoc and time-consuming user's handcrafted features.30 Sleep stage classification is an important preliminary exam of sleep disorders. Using 61 polysomnography (PSG) time series data, Chambon et al. built a deep learning model for classification of sleep stage. The model showed a better performance, relative to traditional method, with little run time and computational cost.31

Healthcare Dataset: Audio and Video

Example of audio data

Figure 7. Example of audio data.

Sound event detection (SED) deals with detection of the onset and offset times for each sound event in an audio recording and associates a textual descriptor. SED has been drawing great interest recently in the healthcare domain for healthcare monitoring. Cakir et al. combined CNNs and RNNs in a convolutional recurrent neural network (CRNN) and applied it to a polyphonic sound event detection task. They observed a considerable improvement in the CRNN model.32

Videos are a sequence of images; in some cases they can be considered as a time series, and in very particular cases as dynamical systems. Deep learning techniques helps researchers in both computer vision and multimedia communities to boost the performance of video analysis significantly and initiate new research directions to analyze video content. Microsoft started a research project called InnerEye* that uses machine learning technology to build innovative tools for the automatic, quantitative analysis of three-dimensional radiological images. Project InnerEye employs algorithms such as deep decision forests as well as CNNs for the automatic, voxel-wise analysis of medical images.33 Khorrami et al. built a model on videos from the Audio/Visual Emotion Challenge (AVEC 2015) using both RNNs and CNNs, and performed emotion recognition on video data.34

Healthcare Dataset: Molecular Structure

Molecular structure of 4CDG

Figure 8. Molecular structure of 4CDG (Source:

Figure 8 shows a typical example of the molecular structure of one drug molecule. Generally, the design of a new molecule is associated with the historical dataset of old molecules. In quantitative structure-activity relationship (QSAR) analysis, scientists try to find known and novel patterns between structures and activity. At the Merck Research Laboratory, Ma et al. used a dataset of thousands of compounds (about 5000), and built a model based on the architecture of DNNs (deep neural nets).35 In another QSAR study, Dahl et al. built neural network models on 19 datasets of 2,000‒14,000 compounds to predict the activity of new compounds.36 Aliper and colleagues built a deep neural network–support vector machine (DNN–SVM) model that was trained on a large transcriptional response dataset and classified various drugs into therapeutic categories.37 Tavanaei developed a convolutional neural network model to classify tumor suppression genes and proto-oncogenes with 82.57 percent accuracy. This model was trained on tertiary structures proteins obtained from protein data bank.38 AtomNet* is the first structure-based DCNN. It incorporates structural target information and consequently predicts the bioactivity of small molecules. This application worked successfully to predict new, active molecules for targets with no previously known modulators.39

AI: Solving Healthcare Problems

Here are a few practical examples where AI developers, startups, and institutes are building and testing AI models:

  • As emotional intelligence indicators that detect subtle cues in speech, inflection, or gesture to assess a person’s mood and feelings
  • Help in tuberculosis detection
  • Help in the treatment of PTSD
  • AI chatbots (Florence*, SafedrugBot*, Babylon Health*, SimSensei*)
  • Virtual assistants in helping patients and clinicians
  • Verifying insurance
  • Smart robots that explain lab reports
  • Aging-based AI centers
  • Improving clinical documentation
  • Personalized medicine

Data Science and Health Professionals: A Combined Approach

Deep learning has great potential to help medical and paramedical practitioners by:

  • Reducing the human error rate40 and workload
  • Helping in diagnosis and the prognosis of disease
  • Analyzing complex data and building a report

The examination of thousands of images is complex, time consuming, and labor intensive. How can AI help?

A team from Harvard Medical School’s Beth Israel Deaconess Medical Center noticed a 2.9 percent error rate with the AI model and a 3.5 percent error rate with pathologists for breast cancer diagnosis. Interestingly, the pairing of “deep learning with pathologist” showed a 0.5 percent error rate, which is an 85 percent drop.40 Litjens et al. suggest that deep learning holds great promise in improving the efficacy of prostate cancer diagnosis and breast cancer staging. 41,42

Intel® AI Developer Program

Intel provides educational software and hardware support to health professionals, data scientist and AI developers, and makes available free AI training and tools through the Intel® AI Developer Program.

Intel recently published a series of AI hands-on tutorials, walking through the process of AI project development, step-by-step. Here you will learn:

  • Ideation and planning
  • Technology and infrastructure
  • How to build an AI model (data and modeling)
  • How to build and deploy an app (app development and deployment)

Intel is committed to providing a solution for your healthcare project. Please read the article on the Intel AI Developer Program to learn more about solutions using Intel® architecture (Intel® Processors for Deep Learning Training). In the next article, we explore examples of healthcare datasets where you will learn how to apply deep learning. Intel is committed to help you to achieve your project goals.


  1. Faggella, D. Machine Learning Healthcare Applications – 2018 and Beyond. Techemergence.
  2. Artificial intelligence in healthcare - Wikipedia. (Accessed: 12th February 2018)
  3. Intel® Math Kernel Library (Intel® MKL) for Deep Learning Networks: Part 1–Overview and Installation | Intel® Software. (Accessed: 14th February 2018)
  4. Lecun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
  5. Mamoshina, P., Vieira, A., Putin, E. & Zhavoronkov, A. Applications of Deep Learning in Biomedicine. Molecular Pharmaceutics 13, 1445–1454 (2016).
  6. From $600 M to $6 Billion, Artificial Intelligence Systems Poised for Dramatic Market Expansion in Healthcare. (Accessed: 12th February 2018)
  7. Accenture. Artificial Intelligence in Healthcare | Accenture.
  8. Marr, B. How AI And Deep Learning Are Now Used To Diagnose Cancer. Foboes
  9. Executive Summary: Data Growth, Business Opportunities, and the IT Imperatives | The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things. Available at: . (Accessed: 12th February 2018)
  10. Lifelong brain-stimulating habits linked to lower Alzheimer’s protein levels | Berkeley News. (Accessed: 21st February 2018)
  11. Emphysema H and E.jpg - Wikimedia Commons (Accessed : 23rd February 2018). Emphysema H and E.
  12. Superficie_ustioni.jpg (696×780). (Accessed: 23rd February 2018). 
  13. Heart_frontally_PDA.jpg (1351×1593). (Accessed: 27th February 2018).  Heart_frontally_PDA.jpg
  14. Kaggle competition-Intel and MobileODT Cervical Cancer Screening. Intel and MobileODT Cervical Cancer Screening. Which cancer treatment will be most effective? (2017).
  15. Intel and MobileODT* Competition on Kaggle*. Faster Convolutional Neural Network Models Improve the Screening of Cervical Cancer. December 22 (2017).
  16. Kaggle*, I. and M. C. on. Deep Learning Improves Cervical Cancer Accuracy by 81%, using Intel Technology. December 22 (2017).
  17. Xu, M. et al. A deep convolutional neural network for classification of red blood cells in sickle cell anemia. PLoS Comput. Biol. 13, 1–27 (2017).
  18. Gulshan, V. et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA 316, 2402 (2016).
  19. Jäge, P. F. et al. Revealing hidden potentials of the q-space signal in breast cancer. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 10433 LNCS, 664–671 (2017).
  20. Ali, A.-R. Deep Learning in Oncology – Applications in Fighting Cancer. September 14 (2017).
  21. Zhang, Q. et al. Sonoelastomics for Breast Tumor Classification: A Radiomics Approach with Clustering-Based Feature Selection on Sonoelastography. Ultrasound Med. Biol. 43, 1058–1069 (2017).
  22. Danaee, P., Ghaeini, R. & Hendrix, D. A. A deep learning approach for cancer detection and relevant gene indentification. Pac. Symp. Biocomput. 22, 219–229 (2017).
  23. Kaggle: Breast Cancer Diagnosis Wisconsin. Breast Cancer Wisconsin (Diagnostic) Data Set: Predict whether the cancer is benign or malignant.
  24. What is the Role of Natural Language Processing in Healthcare? (Accessed: 1st February 2018)
  25. VA uses EHRs, natural language processing to spot suicide risks. (Accessed: 1st February 2018)
  26. Predictive Analytics, NLP Flag Psychosis with 100% Accuracy. (Accessed: 1st February 2018)
  27. Heart_block.png (450×651). (Accessed: 23rd February 2018)
  28. Schirrmeister, R. T. et al. Deep learning with convolutional neural networks for brain mapping and decoding of movement-related information from the human EEG Short title: Convolutional neural networks in EEG analysis. (2017).
  29. Schirrmeister, R. T. et al. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp. 38, 5391–5420 (2017).
  30. Pourbabaee, B., Roshtkhari, M. J. & Khorasani, K. Deep Convolutional Neural Networks and Learning ECG Features for Screening Paroxysmal Atrial Fibrillation Patients. IEEE Trans. Syst. Man, Cybern. Syst. 1–10 (2017). doi:10.1109/TSMC.2017.2705582
  31. Chambon, S., Galtier, M. N., Arnal, P. J., Wainrib, G. & Gramfort, A. A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series. arXiv:1707.0332v2 (2017).
  32. Cakir, E., Parascandolo, G., Heittola, T., Huttunen, H. & Virtanen, T. Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. IEEE/ACM Trans. Audio, Speech, Lang. Process. 25, 1291–1303 (2017).
  33. Project InnerEye – Medical Imaging AI to Empower Clinicians. Microsoft
  35. Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V. Deep neural nets as a method for quantitative structure-activity relationships. J. Chem. Inf. Model. 55, 263–274 (2015).
  36. Dahl, G. E., Jaitly, N. & Salakhutdinov, R. Multi-task Neural Networks for QSAR Predictions. (University of Toronto, Canada. Retrieved from, 2014).
  37. Aliper, A. et al. Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. Mol. Pharm. 13, 2524–2530 (2016).
  38. Tavanaei, A., Anandanadarajah, N., Maida, A. & Loganantharaj, R. A Deep Learning Model for Predicting Tumor Suppressor Genes and Oncogenes from PDB Structure. bioRxiv  October 22, 1–10 (2017).
  39. Wallach, I., Dzamba, M. & Heifets, A. AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery. 1–11 (2015). doi:10.1007/s10618-010-0175-9
  40. Kontzer, T. Deep Learning Drops Error Rate for Breast Cancer Diagnoses by 85%. September 19 (2016).
  41. Litjens, G. et al. Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci. Rep. 6, (2016).
  42. Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017).

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at