An Easy Introduction to XGBoost: A Comprehensive Guide to the Library and Intel Optimizations

Get the Latest on All Things CODE

author-image

By

What is XGBoost?

eXtreme Gradient Boosting, is an open-source machine learning library built for an efficient implementation of distributed, gradient boosted tree-based algorithms.

The library’s scalability, flexibility, and portability make it an extensively used ML framework for Python, C++, Java, R, Scala, Perl, and Julia on Windows, Linux, and macOS.

The XGBoost algorithm was primarily designed for efficiency of compute time and memory resources. XGBoost delivers parallel-tree boosting and focuses on machine learning techniques such as regression, classification, and ranking problems.

Why is Boosting Important?

Boosting is an ensemble technique where new models are added to correct the errors made by existing models and Gradient boosting technique employs gradient descent to minimize the loss function.

XGBoost further improves the generalization capabilities of gradient boosting method by employing advanced vector norms such as L1 norm (sum of the absolute values of the vector) and L2 norm (square root of the sum of the squared vector values).

Intel has powered several optimizations for XGBoost to accelerate gradient boosting models and improve its training and inference capabilities, including XGBoost Optimized for Intel® Architecture.

Installing and using XGBoost optimizations

For versions of XGBoost higher than 0.81, Intel has up-streamed training optimizations of XGBoost using the ‘hist’ parameter method. Optimizations for the inference stage were up-streamed after version 1.3.1, so any version higher than 1.3.1 has Intel optimizations for both training and inference phases. To get an extra inference boost to the already up-streamed optimizations or if you are using a version of XGBoost older than 0.81, you can convert the XGBoost model to daal4py (an API to the Intel® oneAPI Data Analytics Library (oneDAL)).

Intel-optimized XGBoost can be installed in the following ways:

  • As a part of Intel® AI Analytics Toolkit
  • From PyPI repository, using pip package manager: pip install xgboost
  • From Anaconda package manager:
      -  Using Intel channel: conda install xgboost –c intel
      -  Using conda-forge channel: conda install xgboost –c conda-forge
  • As a Docker container (provided you have a DockerHub account)

ML training using XGBoost: Comparative performance analysis with and without Intel optimizations

We have created a code sample that shows the performance comparison of XGBoost without (0.81 version) and with (1.4.2 version) Intel optimizations. In this sample, we used the popular Higgs dataset with particle features and functions of those features to distinguish between a signal process which produces Higgs bosons and a background process which does not produce them. The Higgs boson is a basic particle in the standard model produced by the quantum excitation of the Higgs field. Here, a XGBoost model is trained, and the results with and without Intel’s optimizations are compared.

IMP NOTE: To leverage training optimizations of XGBoost for Intel Architecture, initialize the ‘tree_method’ parameter to ‘hist’ as mentioned in the code sample.

You can learn more about the XGBoost parameters mentioned in the code sample and other available parameters here.

The performance of XGBoost with (0.9, 1.0, 1.1 versions) and without Intel optimizations (0.81 version) is compared on the following real-life datasets:

Comparing the execution times for training shows up to 16X improvement with the Intel optimizations.

Figure 1: Release-to-release acceleration of XGBoost training (See configuration details below1)

ML inference using XGBoost: Comparative performance analysis using up-streamed Intel optimizations and with oneDAL

For Python 3.6 and higher versions, the inference speedup attained by XGBoost optimized by Intel can further be elevated using the daal4py API of Intel® oneDAL library. Install daal4py, import it and you are all set to boost the inference stage by adding a single line of code between your model training and prediction phases – yes, it is that simple!

Here is the way to expedite your inference process using XGBoost optimized by Intel.

Install the daal4py API:

!pip install daal4py

For more ways to install daal4py and detailed information on the API, check out this link.

Import daal4py as follows:

import daal4py as dp

The performance of stock XGBoost with daal4py acceleration is compared on the following datasets,

  • Mortgage (45 features, ~9M observations)
  • Airline (691 features, one-hot encoding, ~1M observations)
  • Higgs (28 features, 1M observations)
  • MSRank (136 features, 3M observations)

The result shows up to 36X improvement using daal4py/the oneDAL library.

Figure 2: Performance comparison: stock XGBoost Prediction vs. Daal4py Prediction (See configuration details below2)

Additionally, here is a simple example demonstrating the use of daal4py. In the example, a XGBoost model is trained, and results are predicted using the daal4py prediction method. Also, this code sample shows performance comparison between XGBoost prediction and daal4py prediction for the same accuracy.

What’s next?

If XGBoost is your library of choice for AI/ML workflow, you can get performance optimizations driven by Intel automatically by updating to the latest library version to make your gradient boosted tree algorithm run faster by several times. You can also leverage the daal4py API of Intel oneAPI Data Analytics Library if you want even greater performance enhancement for Inference workloads. We also encourage you to check out and incorporate Intel’s other AI/ML Framework optimizations and end-to-end portfolio of tools into your AI workflow and, if interested, to learn about the unified, open, standards-based oneAPI programming model at the foundation of these tools and optimizations.

Useful resources

Acknowledgment:

We would like to thank Vadim Sherman, Rachel Oberman, Preethi Venkatesh, Praveen Kundurthy, John Kinsky, Jimmy Wei, Louie Tsai, Tom Lenth, Jeff Reilly, Monique Torres, Keenan Connolly, and John Somoza for their review and approval help.

Product and Performance Information

1Figure 1: Image source: MediumHardware Configuration: c5.metal AWS EC2 instance: Intel® Xeon® 8275CL processor, two sockets with 24 cores per socket, 192 GB RAM (12 slots/32 GB/2933 MHz), HT:on. OS: Ubuntu* 18.04.2 LTS; Testing date: 11/10/2020; Software Configuration: XGBoost — releases 0.81, 0.9, 1.0, 1.1 build from sources. Other software: Python 3.6, NumPy 1.16.4, pandas 0.25, Scikit-learn 0.21.2.
2Figure 2: Image source: IntelHardware Configuration: Intel Xeon Platinum 8275CL (2nd generation Intel Xeon Scalable processors): 2 sockets, 24 cores per socket, HT:on, Turbo:on. OS: Ubuntu 18.04.4 LTS (Bionic Beaver), total memory of 192 GB (12 slots/16 GB/2933 MHz); Testing date: 05/18/2020; Software Configuration: XGBoost 1.2.1, daal4py version 2020 update 3, Python 3.7.9, numpy 1.19.2, pandas 1.1.3, and scikit-learn 0.23.2.