Detecting Diabetic Retinopathy Using Deep Learning on Intel® Architecture

Published: 03/19/2018  

Last Updated: 03/19/2018


Diabetic retinopathy (DR) is one of the leading causes of preventable blindness. This is rampant in people across the globe. Detecting it is a time-consuming and manual process. This experiment aims to automate the preliminary DR detection based on the retinal image of a patient's eye. TensorFlow* based implementation uses convolutional neural networks to take a retinal image, analyze it, and learn the characteristics of an eye that shows signs of diabetic retinopathy to detect this condition. A simple transfer learning approach with an Inception* v3 architecture model on an ImageNet* dataset was used to train and test on a retina dataset. The experiments were run on Intel® Xeon® Gold processor powered systems. The tests resulted in a training accuracy of about 83 percent, and test accuracy was approximately 77 percent (refer Configurations).


Diabetic retinopathy (DR) is one of the leading causes of preventable blindness. It affects up to 40 percent of diabetic patients, with nearly 100 million cases worldwide, as of 2010. Currently, detecting DR is a time-consuming and manual process that requires a trained clinician to examine and evaluate digital color fundus photographs of the retina. By the time human readers submit their reviews, often a day or two later, the delayed results lead to lost follow up, miscommunication, and delayed treatment. The objective of this experiment is to develop an automated method for DR screening. Consultation of the eyes with DR by an ophthalmologist for further evaluation and treatment would aid in reducing the rate of vision loss, enabling timely and accurate diagnoses.

Continued research in the Deep Learning space resulted in the evolution of many frameworks to solve the complex problem of image classification, detection, and segmentation. These frameworks have been optimized specific to the hardware where they are run for better accuracy, reduced loss, and increased speed. Intel has optimized the TensorFlow* library for better performance on their Intel® Xeon® Gold processors. This paper discusses the training and inferencing DR detection problem that is built using the Inception* v3 architecture with TensorFlow framework on Intel® processor powered clusters. A transfer learning approach was used by taking the weights for Inception v3 architecture on an ImageNet* dataset and using those weights on a retina dataset to train, validate, and test.

Document Content

This section describes in detail the end-to-end steps, from choosing the environment, to running the tests on the trained DR detection model.

Choosing the Environment


The detailed experiments performed on an Intel Xeon Gold processor powered system are as listed in the following table:

Components Details
Architecture x86_64
CPU op-mode(s) 32 bit, 64 bit
Byte order Little-endian
CPU(s) 24
Core(s) per socket Six
Socket(s) Two
CPU family Six
Model 85
Model name Intel® Xeon® Gold 6128 processor @ 3.40 GHz

Table 1. Intel® Xeon® Gold processor configuration.


An Intel® optimized TensorFlow framework along with Intel® Distribution for Python* were used as the software configuration.

Software/Library Version
TensorFlow* 1.4.0 (Intel® optimized)
Python* 3.6 (Intel optimized)

Table 2. On Intel® Xeon® Gold processor.

The listed software configurations are available on the hardware environments chosen, and no source build for TensorFlow was necessitated.


The dataset is a small, curated subset of images that was created from Kaggle's Diabetic Retinopathy Detection challenge’s train dataset. The dataset contains a large set of high-resolution retina images taken under a variety of imaging conditions. A left and right field is provided for every subject. Images are labeled with a subject ID as well as either left or right (for example, 1_left.jpeg is the left eye of patient ID 1). As the images are from different cameras, they may be of different quality in terms of exposure and focus sharpness. Also, some of the images are inverted. The data also has noise in both images and labels.

The presence of disease in each image is labeled on a scale from 0 to 1, as follows:

        0: No Disease

        1: Disease

The dataset provided is split into training set (90 percent files) and test set (10 percent files) for this experiment.

Inception* v3 Architecture

The Inception v3 architecture was built on the intent to improve the utilization of computing resources inside a deep neural network. The main idea behind Inception v3 is the approximation of a sparse structure with spatially repeated dense components and using dimension reduction as used in a network-in-network architecture to keep the computational complexity in bounds, but only when required. The computational cost of Inception v3 is also much lower than other topologies such as AlexNet, VGGNet*, ResNet*, and so on. More information on Inception v3 is given in Going deeper with convolutions3. The Inception v3 architecture is mentioned in the following figure:

Inception* v3 model

Figure 1, Inception* v3 model3.

To accelerate the training process, the transfer learning technique was applied by using a pre-trained Inception v3 model on the ImageNet dataset. The pre-trained model already learned the knowledge on data and stored that in the form of weights. These weights are directly used as initial weights, and they are readjusted when the model is retrained on the retina dataset. The pre-trained model was downloaded from here4:

Execution Steps

This section describes the steps followed in the end-to-end process for training, validation, and testing the retinopathy detection model on Intel® architecture.

These steps include:

  1. Preparing input
  2. Model training
  3. Inference

Preparing Input

Image Directories

The dataset was downloaded from the Nomikxyz / retinopathy-dataset1.

  • The files were extracted and separated into different directories based on the DR types.
  • Nearly 2063 images (diseased and non-diseased folders) were separated and put into a different directory from the primary list.
  • There were 1857 JPEG images of retinas for training, 206 images for testing, and a .CSV file where the level of the disease is written for the train images.

Processing and Data Transformations

  • Images from the training and test datasets have very different resolutions, aspect ratios, colors, are cropped in various ways, and some are of very low quality, out of focus, and so on.
  • To help improve the results during training, the images are augmented through simple distortions like crops, scales, and flips.
  • Images were of varying sizes and were cropped to 299 pixels wide by 299 pixels high.

Model Training

Transfer learning is a technique that reduces the time taken to train from scratch by taking a fully-trained model for a set of categories like ImageNet and retrains from the existing weights for new classes. In the experiment, we retrained the final layer from scratch, while leaving all the others untouched. The following command was run that accesses the training images and trains the algorithm toward detecting diseased images.

The was run on the retina dataset as follows:

python \
  --bottleneck_dir=bottlenecks \
  --how_many_training_steps=300 \
  --model_dir=inception \
  --output_graph=retrained_graph.pb \
  --output_labels=retrained_labels.txt \

The mentioned script loads the pre-trained Inception v3 model, removes the old top layer, and trains the retina images. Though there were no retina class/images in the original ImageNet classes when the full network was trained on it, with transfer learning the lower layers are trained to distinguish between generic features (for example, edge detectors or color blob detectors) that can be reused for other recognition tasks without any modification.

Retraining with Bottlenecks

TensorFlow computes all the bottleneck values as the first step in training. In this step, it analyzes all the images on disk and calculates the bottleneck values for each of them. Bottleneck is an informal term we often use for the last-but-one layer before the final output layer that actually does the classification. This penultimate layer has been trained to output a set of values that is good enough for the classifier to use, to distinguish between all the classes it has been asked to recognize. The reason our final layer retraining can work on new classes is that it turns out that the kind of information needed to distinguish between all of the 1,000 classes in ImageNet is often also useful to distinguish between new kinds of objects like retina, traffic signal, accidents, and so on.

The bottleneck values are then stored as they will be required for each iteration of training. The computation of these values is faster because TensorFlow takes the help of the existing pre-trained model to assist it with the process. As every image is reused multiple times during training, and calculating each bottleneck takes a significant amount of time, it speeds things up to cache these bottleneck values on disk so they do not have to be repeatedly recalculated, and the values are stored in the bottleneck directory.


After the bottlenecks are complete, the actual training of the top layer of the network begins. During the run, the following outputs are generated showing the progress of algorithm training:

  • Training accuracy shows the percentage of the images used in the current training batch that were labeled with the correct class.
  • Validation accuracy is the precision (percentage of correctly labelled images) on a randomly selected group of images from a different set.
  • Cross entropy is a loss function that tells us how well the learning process is progressing.

Training was run on nearly 2063 images with a batch size of 100 for 300 steps/iterations and we observed training accuracy at 83.0 percent (refer Configurations).


We ran the to the trained model on 206 test images with the following script and observed testing accuracy at about 77.2 percent.

python -m scripts.label_image \
    --graph=tf_files/retrained_graph.pb  \

Diseased versus Not probability

Figure 2. Diseased versus Not probability.


In this paper we explained how training and testing retinopathy detection was done using transfer learning where the weights from the model trained Inception v3 on the ImageNet dataset was used. These weights were readjusted when the model was retrained using the Intel Xeon Gold processor-powered environment. The experiment can be extended by applying different optimization algorithms, changing learning rates, and varying input sizes so that the accuracy can be improved further.

About the Author

Lakshmi Bhavani Manda and Ajit Kumar Pookalangara, are part of the Intel team working on the artificial intelligence (AI) evangelization.


For performance reference under Abstract and Training sections:

        Hardware: refer Hardware under Choosing the Environment

        Software: refer Software under Choosing the Environment

        Test performed: executed on remaining 10% of the images using the trained model

For more information go to Product Performance site.


1. For curated dataset:

2. TensorFlow for Poets tutorial:

3. Rethinking the Inception Architecture for Computer Vision::

4. Dataset Link:

Related Resources

TensorFlow* Optimizations on Modern Intel® Architecture:

Build and Install TensorFlow* on Intel® Architecture:

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at