Faster Convolutional Neural Network Models Improve the Screening of Cervical Cancer

Published: 12/22/2017  

Last Updated: 12/22/2017

A Lithuanian Team Tests the Capabilities of AI to Improve Cervical Cancer Screening

Members of winning team

Editor's note: This is one in a series of case studies showcasing finalists in the Kaggle* Competition sponsored by Intel and MobileODT*. The goal of this competition was to use artificial intelligence to improve the precision and accuracy of cervical cancer screening.

First-place winners for the Intel and MobileODT* Cervical Cancer Screening Kaggle* Competition: (from left) Jonas Bialopetravičius, Ignas Namajūnas, and Darius Barušauskas of Team TEST.

Abstract

More than 1,000 participants representing 800 data scientist teams developed algorithms to accurately identify a woman’s cervix type based on images as part of the Intel and MobileODT* Competition on Kaggle. Such identification can help prevent ineffectual treatments and allow health care providers to offer proper referrals for cases requiring more advanced treatment.

This case study follows the process used by the first-place-winning team, TEST (Towards Empirically Stable Training), to create an algorithm that would improve this life-saving diagnostic procedure.

Kaggle Competitions: Data Scientists Solve Real-world Machine Learning Problems

Image of woman with a mobile platform

The goal of Kaggle competitions is to challenge and incentivize data scientists globally to create machine-learning solutions in a wide range of industries and disciplines. In this particular competition – sponsored by Intel and MobileODT, developer of mobile diagnostic tools – more than 1,000 participants representing over 800 data scientist teams each developed algorithms to correctly classify cervix types based oncervical images.

In the screening process for cervical cancer, some patients require further testing while others don't; because this decision is so critical, an algorithm-aided determination can improve the quality and efficiency of cervical cancer screening for these patients. The challenge for each team was to develop the most efficient deep learning model for that purpose.

Team TEST Applies AI Expertise to Cervical Cancer Screening

The winning team consists of these Master and Grandmaster Kaggle competitors all from Lithuania:

Ignas Namajunas

Ignas Namajūnas, Mathematics BS and Computer Science MS, has nearly three years of research and development experience. He served as research lead for nine months on a surveillance project.

 

Darius Barušauskas

Darius Barušauskas, MSc in Econometrics, worked for more than six years in machine learning and deep learning applications. He has created more than 30 models and credit scoring for financial, utilities and telco sector companies. Barušauskas achieved grandmaster tier in less than a year since joining Kaggle.

 

Jonas Bialopetravicius

Jonas Bialopetravičius, Software Engineering BS, Computer Science MS, has more than six years of professional experience in computer vision and machine learning. He is currently studying astrophysics, where he applies deep learning methods.

The team’s experience in successfully training object detectors gave it a considerable advantage. Bialopetravičius and Namajūnas won a previous deep learning competition that required similar know-how, which they easily transferred to this project. "We saw this challenge as an opportunity to bring our experience to an important task and potentially improve the quality of cancer diagnostics," said Namajūnas. "As we had a good toolset to do so, it seemed very logical to adapt our technological know-how for this specific task."

Determining the Steps to a Most Efficient Solution

Team TEST not only realized the special importance of this competition – literally saving lives – but also saw it could be approached as an object detection challenge, where they already had achieved success.

Team members divided responsibilities: Barušauskas created cervix bounding boxes to achieve smaller region of interest within an image, set up validation sets for training, and attempted model stacking. Namajūnas examined data, searched for the right augmentations, and tended to training/testing details. Bialopetravičius worked on the general problem-solving pipeline, trained models and experimented with data augmentations. The team did not meet face-to-face during the challenge but communicated via Slack* messages.

Members used the Faster R-CNN detection framework with VGG16 as the feature extractor. Their main challenge was setting up a good validation set, finding the right augmentations for data, and resolving training details to optimize validation scores.

In total, they used six Faster R-CNN models. A separate model was first trained on all available bounding-box annotated data, which then was run on the stg1 testing set to obtain bounding-boxes. The resulting boxes, combined with the stg1 test set labels, were used for training the rest of the models. This could be generalized to new data, if class labels were provided for each image. Although they believed human-annotated bounding boxes would probably deliver the best result, the team concluded it would be more efficient overall to use bounding boxes generated by the models versus not having bounding boxes at all.

Of the five models that were trained on all of the data, one was trained for classification in night-vision images. This model was used only when a testing image was identified to be night-vision (which was easily done since the color distribution made it obvious). For the majority of remaining images, four different models were used, each doing inference on an image nine times (varying the scale and rotation); the output was then averaged.

In addition, some models were run with a modified non-maximum suppression scheme, yielding a total of 54 inferences over the four models. Team TEST combined the output of different models by taking the mean of individual predictions.

Augmentations – Color Proves a Key Insight

Data augmentations played a crucial role in the team’s competitive performance. While examining data, team members noticed that the most discriminative features were related to how much red blood-like structure was visible. This inspired an important strategy: augmenting the contrast of the red channel (i.e., the color of blood and tissue) was particularly helpful.

The augmentations in order of importance were:

  • Augmenting the contrast of the red channel
  • Randomly rotating the images (in the range of - 45:45 degrees)
  • Turning night vision images to grayscale
  • Blurring the images

Additional data was sampled so that the proportion of original:additional dataset images would be 1:2; had this not been done, the proportions would be closer to 1:4.

Simplified Model

One of the models in the ensemble - red color contrast augmentations 0.4 - could be used separately. Team TEST managed to achieve a log loss of 0.79035 with it (in comparison to the winning submission of 0.76964). So even discarding the rest of the ensemble would yield a first-place showing in the leaderboards. This model only needs to be trained once and does a total of nine inferences. The number of inferences could probably be reduced further without a large drop in log loss.

Team TEST used a customized py-R-FCN (which included Faster R-CNN) code starting from this GitHub* repository.

Training and Inference Methods

The training effectiveness relied heavily on generating extra data. They trained R-CNN like detectors to discover the bounding box of the cervix, simultaneously classifying its type; no models were trained on whole images.

Team members found it beneficial to cast the problem as an object detection exercise (they had their own bounding box annotations) since the region of interest was usually quite small. Each model generated inferences on various scales and rotations of the testing images and the predictions were averaged using a simple arithmetic mean.

Training and Prediction Times

One of the models in the ensemble, red color contrast augmentations 0.4, achieved log loss of 0.79 (ensemble achieved 0.77), a score good enough to win the competition. This model trained in eight hours and needs 0.7 seconds to generate predictions for a single image. (Ensemble needs around 50 hours of training and seven to 10 seconds for inference.)

Dependencies

Results and Key Findings…and a Plan to Keep Saving Lives

In its post-competition write-up, Team TEST noted: "Our log loss of ~0.77 is equivalent to always giving the correct class around 46% confidence. Better accuracy could be achieved with more data and a more precise labeling."

One of their key insights involved the importance of a proper validation scheme. "We noticed that the additional dataset had many similar photos as in the original training set, which itself caused problems if we wanted to use additional data in our models," they wrote. "Therefore, we applied K-means clustering to create a trustworthy validation set. We clustered all the photos into 100 clusters and took 20 random clusters as our validation set. This helped us track if the data augmentations we used in our models were useful or not."

Graph of Confusion matrix validation set 1
Figure 1. Confusion matrix validation set 1, which is sampled from "original" data: loss is 0.62, accuracy is 73%

Graph of Confusion matrix validation set 2
Figure 2. Confusion matrix validation set 2, which is sampled from "additional" data: loss 0.76, accuracy is 68%

For their achievement in the Kaggle Competition, Team TEST will share a $50,000 first-place prize. Going forward, the members intend to apply the lessons from their Kaggle experience to other real-life challenges: they and two other associates are founding their own startup to apply their deep learning expertise in other life-saving medical technologies.

"Since the competition we have been focusing on radiological tasks – lungs, brains, liver, cardiovascular, and so forth," said Namajūnas. "Hands-on Deep Learning experience has helped us to make quite a few models. We are also about to deploy our first model to a local hospital."

Learn More About Intel Initiatives in AI

Intel commends the AI developers who contributed their time and talent to help improve diagnosis and treatment for this life-threatening disease. Committed to helping scale AI solutions through the developer community, Intel makes AI training and tools broadly accessible through the Intel® AI Developer Program.

Take part as AI drives the next big wave of computing, delivering solutions that create, use and analyze the massive amount of data that is generated every minute.

Sign up with Intel® AI Developer Program to get the latest updates on competitions and access to tools, optimized frameworks, and training for artificial intelligence, machine learning, and deep learning.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.