Testing of Six Different AI-Based Models: A Deep Dive to Improve Cervical Cancer Screening

Published: 01/02/2018  

Last Updated: 01/02/2018

Team GRXJ seeks to make a difference, using AI to improve cervical cancer screening.

This is one in a series of case studies showcasing finalists in the Kaggle* Competition sponsored by Intel and MobileODT*. The goal of this competition was to use artificial intelligence to improve the precision and accuracy of cervical cancer screening.


More than 1,000 participants representing 800 data scientist teams developed algorithms to accurately identify a woman’s cervix type based on images as part of the Intel and MobileODT* Competition on Kaggle*. Such identification can help prevent ineffectual treatments. And, it allows health care providers to offer proper referrals for cases requiring more advanced treatment.

This case study follows the process used by the third-place winning team, GRXJ. They pooled their respective skill sets to create an algorithm that would improve this life-saving diagnostic procedure.

Kaggle Competitions: Data Scientists Solve Real-World Problems Using Machine Learning

The goal of Kaggle competitions is to challenge and incentivize data scientists globally to create machine-learning solutions for real-world problems in a wide range of industries and disciplines. In this particular competition – sponsored by Intel and MobileODT, developer of mobile diagnostic tools – more than 1,000 participants representing 800 data scientist teams each developed algorithms to correctly classify cervix types based on cervical images.

In the screening process for cervical cancer, some patients require further testing while others don't; because this decision is so critical, an algorithm-aided determination can significantly improve the quality and efficiency of cervical cancer screening for these patients. The challenge for each team was to develop the most efficient deep learning model for that purpose.

Team Members Inspired by Kaggle Competition’s Meaningful Purpose

The members of Team GRXJ were drawn to participate by the prospect of helping to save lives. And, they saw that they could further their knowledge of cervical cancer while also acquiring new Deep Learning techniques. The team is named for the first initials of its four members:

  • Gilberto Titericz Jr., a San Francisco-based data scientist at Airbnb*.
  • Russ Wolfinger, works as director of dcientific discovery and genomics at SAS Institute in Cary, N.C.
  • Xulei Yang is a research scientist in IHPC, A*STAR, Singapore.
  • Joseph Chui of Philadelphia, a robotics engineer and software developer.

At the start of the competition, GRXJ team members entered as individual competitors. After several individual submissions, Wolfinger and Yang – former teammates in a previous Kaggle competition – agreed to collaborate. They then invited Titericz to join, to leverage his considerable machine learning ensembling skills. Finally, they added Chui to increase the diversity of their ensembled models.

“Team-up is a good strategy to win Kaggle competitions,” commented Titericz. “Choosing team members with different solutions helps improve the model performance when ensembling.” Collectively, the teammates’ time spent on the project ranged from 40 hours over the course of two months, to several hours a day nearing the end of the competition.

Choosing an Approach to Code Optimization

In the early stages, team members each developed and optimized their own single models independently with minimal code dependence:

Russ Wolfinger started by probing the leaderboard and had the public test set labels to train on a few weeks before the model submission deadline. For most models unet or faster rcnn was used to do object detection. Crops were created and bagged image classification used with ResNet-50* and VGG-16 backbones and then hill climbing to ensemble based on 2-fold csv with the public train and test sets as their splits. Wolfinger used additional training data in only one of his models and included one custom model from Chui, plus one more based on unet from an additional contributor.

Xulei Yang divided the task into two stages:

  • Stage 1: cervical detection by using yolo models on full images. This entailed generating label txt files, training various yolo models, cropping cervical ROIs, re-training yolo models, post-filtering ROIs, and generating Stage 1 output which consisted of 4760 rois in additional set, 1780 rois in training set, and 512 rois in tst_stg1 set. All cropped rois are resized to 224x224 for further processing.
  • Stage 2: cervical classification by Resnet-50 on cropped cervical rois images. These steps included preprocessing using various data augmentation techniques, selection of CNNs for training, two sets of models, and final results.

Joseph Chui used Convolutional Neural Network in his training model. Since the competition didn't judge how fast the code runs on a specific platform, little was done to fine-tune time performance. Because it still took hours to train the model, care was taken when setting the values of the hyper parameters. In his model, Chui traded a faster training time by reducing the number of augmented data and increasing the learning rate. Owing to a limited amount of memory on the GPU, the size of mini-batch was constrained to a smaller number. Each picture in the model was cropped to a square-like ROI automatically before training or predicting. He performed fine-tuning of pre-trained models, VGG-16, VGG-19, Xception, and Inception-v3. Data augmentation x8 was used to train, which includes stage 1’s training and labeled test data. Keras on TensorFlow* was used as the programming interface to Python*. Multiple pre-trained models in Keras were fine-tuned, and the predictions from each model were averaged with equal weights before ensemble with other models in the team.

Gilberto Titericz Jr. developed two Keras VGG-16 models by fine tuning of pre-trained weights. He then used a five-fold cross strategy to generate out-of-fold (OOF) predictions (on training set) and average testing (on test set) csv files, and finally ensembled the csv files by using “Nelder-Mead" optimization approach to find the best weights. His ensemble method was also used for the final model’s selection based in performance and results ensembling.

Summary: Bringing It All Together as a Team

Their independent approach allowed each of them to improve their models within constrained schedules. Before the final stage, each single model of the solution was trained and predicted on two pre-agreed sets of labeled data. The prediction results of individual models were evaluated using hill climbing algorithms to yield optimal blending weights. These same blending weights were used to average the final predictions.

The team’s ultimate solution is based on predictions for six separate models – four from Wolfinger, and one each from Chui and Yang – that were blended to achieve the final predictions. To train the Deep Learning models, they used the algorithms: tf faster R-CNN, U-Net, YOLO and Keras fine tuned VGG-16, VGG-19, Exception, ResNet-50, and Inception v3.

Six models must be trained from three folders:
Models 1-4: r1, r2, r3, and r4. Faster R-CNN for region detection, cropping, and image classification. The most important features are central regions of the images. Tools used included tensorflow, mxnet, and keras. It took about one day to train all models.
Model 5: x1. Yolo v2 models are used for cervical objects detection. The detected roi images are then cropped and resized, and finally used for image classification based on fine-tuned ResNet-50 models. Tools include darknet and keras.
Model 6: j1. ROI extraction using OpenCV, combining the prediction results using fine-tuned Keras’s pre-trained models (VGG-16, VGG-19, Inception-v3 and Exception).

The blending strategy chosen to mix the predictions is a geometric average according this formula: P = exp( ( log(r1) + 3*log(r2) + log(r3) + log(r4) + 3*log(x1) + log(j1) ) / 10 )
The weights are chosen based on a hill climbing algorithm.

Data Augmentation

According to Chui, data augmentation x8 was used to help in training the data, which included stage 1’s training and labeled test data. No data in “additional data” 7z files was used.

Training and Prediction Time

Convolutional Neural Network were used in Chui’s training model. Fine-tuning of pre-trained models, VGG-16, VGG-19, Xception, and Inception-v3 were performed. The prediction results from each model were averaged with equal weights before ensemble with the team.

Simple Features and Methods

The team found that it’s feasible to include or exclude pre-trained models. Instead of using four pre-trained models, one can cut down the training and predicting time in half by excluding two of the models. It is also possible to improve the performance by including other fine-tuned pre-trained models.

Results and Key Findings

Upon further investigation of their separate models, as well as their combinations, team members found that:

  • Single models like their j1 or x1 can achieve Private leaderboard (LB) around 0.82, only slightly worse than their final ensemble score in Private LB around 0.81.
  • A blend of r1^0.15 * j1^0.45 * x1^0.40 can achieve around 0.78 in Private LB, next to the best score (#1) 0.76 in Private LB.

This indicates a more advanced stacking method would achieve even better results in the competition.

Titericz noted that a simple geometric average choosing uniform weights for all models would perform better in private LB (0.80417 vs 0.80830) than the weights chosen by the hill climbing algorithm. “It makes sense since a simple geometric average is more robust against overfitting,” he said.

GRXJ made little use of additional data but did use bounding box annotations that were provided on the forum. They probed the leaderboard early in the competition and so had the test set labels. This freed the team to explore different cross-validation schemes and not be in a hurry to re-train all models during the final week. “We were able to avoid chasing leaderboard rank the whole time and focus on good fitting models,” said Wolfinger.

Learn More About Intel Initiatives in AI

Intel commends the AI developers who contributed their time and talent to help improve diagnosis and treatment for this life-threatening disease. Committed to helping scale AI solutions through the developer community, Intel makes AI training and tools broadly accessible through the Intel® AI Developer Program.

Take part as AI drives the next big wave of computing, delivering solutions that create, use and analyze the massive amounts of data that are generated every minute.

Sign up with Intel® AI Developer Program to get the latest tools, optimized frameworks, and training for artificial intelligence, machine learning, and deep learning.

Meet Team GRXJ

The four members of team GRXJ, named for each of their first initials, began this Kaggle challenge as individual competitors before pooling their respective strengths. They are:

Gilberto Titericz Jr., a San Francisco-based data scientist at Airbnb, holds a bachelor’s degree in Electronics Engineering and an MSc in Electric Engineering.



wolfingerRuss Wolfinger, who has a PhD in Statistics, works as director of scientific discovery and genomics at SAS Institute in Cary, N.C.




Xulei Yang is a research scientist in IHPC, A*STAR, Singapore. He holds a PhD in Electrical and Electronics Engineering, and is an IEEE senior member. His current research focus is on deep learning for biomedical image analysis.



Joseph Chui of Philadelphia, a robotics engineer and software developer for 15 years, is focused on developing applications using GPUs and 3D graphics.





Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.