Cookbook

  • 2021.5
  • 12/28/2021
  • Public Content

Kaggle Kernels for Classification Tasks

The following Kaggle kernels show how to patch scikit-learn with Intel® Extension for Scikit-learn* for various classification tasks. These kernels usually include a performance comparison between stock scikit-learn and scikit-learn patched with Intel® Extension for Scikit-learn*.
TPS stands for Tabular Playground Series, which is a series of beginner-friendly Kaggle competitions.

Binary Classification

Kernel
Goal
Content
Data:
[TPS Nov 2021] Synthetic spam emails data
Identify spam emails via features extraced from the email
  • data preprocessing (normalization)
  • search for optimal parameters using Optuna
  • training and prediction using scikit-learn-intelex
  • performance comparison to scikit-learn
Data:
[TPS Apr 2021] Synthetic data based on Titanic dataset
Predict whether a passenger survivies
  • data preprocessing
  • feature construction
  • search for optimal parameters using Optuna
  • training and prediction using scikit-learn-intelex
  • performance comparison to scikit-learn
Data:
[TPS Apr 2021] Synthetic data based on Titanic dataset
Predict whether a passenger survivies
  • data preprocessing
  • training and prediction using scikit-learn-intelex
  • performance comparison to scikit-learn
Data:
[TPS Apr 2021] Synthetic data based on Titanic dataset
Predict whether a passenger survivies
  • data preprocessing
  • feature engineering
  • training and prediction using scikit-learn-intelex
  • performance comparison to scikit-learn

MultiClass Classification

Kernel
Goal
Content
Predict the category of an eCommerce product
  • data preprocessing with Quantile Transformer
  • training and prediction using scikit-learn-intelex
  • search for optimal paramters using Optuna
  • performance comparison to scikit-learn
Data:
[TPS May 2021] Synthetic eCommerce data
Predict the category of an eCommerce product
  • data preprocessing
  • training and prediction using scikit-learn-intelex
Predict the category of an eCommerce product
  • data preprocessing: one-hot encoding, dimensionality reduction with PCA, normalization
  • creating a stacking classifier with logistic regression, kNN, and random forest, and a pipeline of Quantile Transformer and another logistic regression as a final estimator
  • searching for optimal parameters for the stacking classifier
  • training and prediction using scikit-learn-intelex
  • performance comparison to scikit-learn

Classification Tasks in Computer Vision

Kernel
Goal
Content
Recognize hand-written digits
  • data preprocessing
  • training and prediction using scikit-learn-intelex
  • performance comparison to scikit-learn
Recognize hand-written digits
  • training and prediction using scikit-learn-intelex
  • performance comparison to scikit-learn

Classification Tasks in Natural Language Processing

Kernel
Goal
Content
Data:
Natural Language Processing with Disaster Tweets
Predict which tweets are about real disasters and which ones are not
  • data preprocessing
  • TF-IDF calculation
  • search for optimal paramters using Optuna
  • training and prediction using scikit-learn-intelex
  • performance comparison to scikit-learn
Use recipe ingredients to predict the cuisine
  • feature extraction using TfidfVectorizer
  • training and prediction using scikit-learn-intelex
  • performance comparison to scikit-learn

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.