1.47x Speed-Up for Popular Machine-Learning Library

Published: 12/09/2020  

Last Updated: 12/09/2020

Yandex Optimizes the Performance of CatBoost with Intel® VTune™ Profiler Hotspot Analysis


Get the Latest on All Things CODE
Sign Up


Yandex1 is the number one internet and cloud company in Russia and a strong contributor to machine learning and artificial intelligence worldwide. Its popular CatBoost2 library is a high-performance, open-source library for gradient boosting on decision trees.

When Yandex needed to identify performance bottlenecks in CatBoost, it collaborated with the software development team of Intel. They used Intel® VTune™ Profiler3 and key debugging tools from the Intel® oneAPI Base Toolkit4 for hot spot analysis of the CatBoost framework on several datasets. By identifying bottlenecks, Yandex was able to speed up the performance of CatBoost by 1.47x on Intel® platforms.


Efficient Machine-Learning Models

Yandex researchers developed CatBoost for training and prediction on machine-learning models. Yandex and other prominent companies, including CERN and Cloudflare*, rely on CatBoost’s features. Developers can cut the time they spend on parameter tuning by using the default parameters of CatBoost. To improve training results, CatBoost makes it possible to use non numeric factors instead of having to preprocess data or spend time and effort turning it to numbers. Users can train their models on a fast implementation of a gradient-boosting algorithm. A model applier lets users apply their trained model quickly and efficiently, even to latency-critical tasks.

To maximize the value of CatBoost, Yandex needed to ensure that the performance on CPU bare metal or cloud is optimal. To ensure top performance, it used the Intel® Software Development Tool, Intel VTune Profiler.


Maximize the Performance of CatBoost

Yandex evaluated CatBoost’s performance on several open-sourced datasets for Intel® CPU platforms, including Intel® Xeon® processors5 and Intel® Xeon® Scalable processors6 (Figures 1 and 2).

Intel VTune Profiler analyzes the code, collects key profiling data, and presents its findings through an interface that simplifies interpretation and helps developers to focus on the most effective software optimizations, from computation and threading to memory and storage.

Yandex tested the training time of the datasets listed on the left in Figure 1 and Figure 2, and demonstrated the speed-up of these models with the optimizations suggested by Intel VTune Profiler

The hot spot analysis of Intel VTune Profiler demonstrated issues with false sharing and extra atomic usage that were compromising memory access efficiency. By identifying bottlenecks, Yandex was able to speed up the performance of CatBoost by 1.47x.


Identify Bottlenecks and Boost Performance

This joint effort of the Intel and Yandex teams is helping data scientists to train more complicated models and datasets faster on Intel platforms, and raise the popularity of the CatBoost machine-learning library among the developer community. CatBoost’s performance results will help data scientists around the world to use their compute resources more efficiently and save on cloud resources.

Intel® Software Development Tools proved effective for Yandex software developers and helped to bring value for data scientists worldwide.


Figure 1. Intel Xeon processor 6230 used for training, 40 physical cores with 1 thread per physical core


Figure 2. Intel Xeon processor E5-2660 v4 with 2 sockets, 14 cores per socket, 2 HT per core, 1 thread per physical core



  1. Yandex
  2. CatBoost
  3. Intel® VTune™ Profiler
  4. Intel® oneAPI Base Toolkit
  5. Intel® Xeon® Processor
  6. Intel® Xeon® Scalable Processor


Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.