Cookbook

  • 2021.5
  • 12/28/2021
  • Public Content

Kaggle Kernels for Regression Tasks

The following Kaggle kernels show how to patch scikit-learn with Intel® Extension for Scikit-learn* for various regression tasks. These kernels usually include a performance comparison between stock scikit-learn and scikit-learn patched with Intel® Extension for Scikit-learn*.
TPS stands for Tabular Playground Series, which is a series of beginner-friendly Kaggle competitions.

Using a Single Regressor

Kernel
Goal
Content
Data:
[TPS Jul 2021] Synthetic pollution data
Predict air pollution measurements over time based on weather and input values from multiple sensors
  • data preprocessing
  • search for optimal paramters using Optuna
  • training and prediction using scikit-learn-intelex
Data:
[TPS Aug 2021] Synthetic loan data
Calculate loss associated with a loan defaults
  • data preprocessing
  • feature engineering
  • training and prediction using scikit-learn-intelex
  • performance comparison to scikit-learn
Data:
House Prices dataset
Presict sale prices for a property based on its characteristics
  • data preprocessing
  • exploring outliers
  • feature engineering
  • filling missing values
  • search for optimal parameters using Optuna
  • training and prediction using scikit-learn-intelex
  • performance comparison to scikit-learn
Data:
[TPS Jul 2021] Synthetic pollution data
Predict air pollution measurements over time based on weather and input values from multiple sensors
  • checking correlation between features
  • search for best paramters using GridSearchCV
  • training and prediction using scikit-learn-intelex
  • performance comparison to scikit-learn
Data:
[TPS Jul 2021] Synthetic pollution data
Predict air pollution measurements over time based on weather and input values from multiple sensors
  • data preprocessing
  • feature engineering
  • search for optimal parameters using Optuna
  • training and prediction using scikit-learn-intelex
  • performance comparison to scikit-learn
Data:
[TPS Sep 2021] Synthetic insurance data
Predict the probability of a customer making a claim upon an insurance policy
  • data preprocessing
  • filling missing values
  • search for optimal parameters using Optuna
  • training and prediction using scikit-learn-intelex
  • performance comparison to scikit-learn

Stacking Regressors

Kernel
Goal
Content
Data:
[TPS Jul 2021] Synthetic pollution data
Predict air pollution measurements over time based on weather and input values from multiple sensors
  • feature engineering
  • creating a stacking regressor
  • search for optimal parameters using Optuna
  • training and prediction using scikit-learn-intelex
  • performance comparison to scikit-learn
Predict total sales for every product and store in the next month based on daily sales data
  • data preprocessing
  • creating a stacking regressor
  • search for optimal parameters using Optuna
  • training and prediction using scikit-learn-intelex
  • performance comparison to scikit-learn

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.