Intel® Distribution of Modin Getting Started Guide

ID 739197
Updated 1/27/2022
Version Latest
Public

author-image

By

By Rachel Oberman

About Intel® Distribution of Modin

The Intel® Distribution of Modin* is a performant, parallel, and distributed dataframe system that is designed around enabling data scientists to be more productive with the tools that they love in a single line code change with exclusive optimizations for Intel hardware. This library is fully compatible with the Pandas API.

Intel is one of the largest contributors and maintainers to the open-source Modin project. The difference between the Intel® Distribution of Modin and the open-source Modin project is that Intel® Distribution of Modin is the Intel-owned build of Modin, which can also be powered by OmniSci* in the backend, besides Ray* and Dask*, and to provide accelerated analytics on Intel® platforms for even more performance benefits. Intel® Distribution of Modin is also the version of Modin available through the Intel® oneAPI AI Analytics Toolkit, and is validated against other Intel-optimized packages included in AI Kit to give users the best end-to-end experience.

The open-source Modin project is maintained not only by Intel, but the rest of the open-source Modin project community.

For more information on the purpose and functionality of the Modin package, please refer to the Modin documentation.

Supported Installation Options

Install via Intel® oneAPI AI Analytics Toolkit

The Intel® oneAPI AI Analytics Toolkit includes Intel® Distribution of Modin with the OmniSci* backend. Downloading Intel® Distribution of Modin with the AI Analytics Toolkit is currently only available through the following command:

conda install -c intel intel-aikit-modin

  • This will install Intel® Distribution of Modin with its OmniSci backend, along with other AI Kit optimizations such as Intel® Distribution for Python, Intel® Extension for Scikit-Learn*, and XGBoost Optimized for Intel® Architecture.

You can find more detailed information about the toolkit here.

Install via Individual Component

There are multiple options to install the Intel® Distribution of Modin from Anaconda.

Linux*, Windows*, and MacOS* are supported (x86 architecture only) - see more details on the Modin Installation Guide.

Install from Anaconda:

  • Recommended Installation Call:
    conda install -c intel modin-all
    • ​Installs all available backends (OmniSci* backend install is unavailable for Windows*)
  • Intel channel (Recommended Installation Channel):
    For installation commands from the Anaconda Defaults channel, use the following format, where “package_name” is the desired package name from the table:
    conda install -c intel package_name
    For example:
    conda install -c intel modin-all
Package Name in Intel® Channel Engine(s) Supported OSs
modin-all (recommended) Dask*, Ray*, OmniSci* Linux*
modin-ray (stable backend) Ray* Linux*, Windows*
modin-omnisci (experimental, for best performance) OmniSci* Linux*
modin Dask* Linux*, Windows*, MacOS*
modin-dask Dask* Linux*, Windows*, MacOS*
  • Defaults channel:
    For installation commands from the Anaconda Defaults channel, use the following format, where “package_name” is the desired package name from the table:
    conda install package_name
    For example:
    conda install modin-all
Package Name in Defaults Channel Engine(s) Supported OSs
modin-all (recommended) Dask*, Ray*, OmniSci* Linux*
modin-ray (stable backend) Ray* Linux*, Windows*
modin-omnisci (experimental, for best performance) OmniSci* Linux*
modin Dask* Linux*, Windows*, MacOS*
modin-dask Dask* Linux*, Windows*, MacOS*
  • Open-Source Modin

To install the open-source Modin package, which Intel also heavily contributes and maintains, visit the relevant open-source Modin Installation documentation. This includes Anaconda Conda-Forge channel, PyPI, and Build from Source options.

Anaconda Conda-Forge Channel Installation

  • Conda-forge channel:
    For installation commands from the Anaconda Defaults channel, use the following format, where “package_name” is the desired package name from the table:
    conda install -c conda-forge package_name
    For example:
    conda install -c conda-forge modin-all
Package name in conda-forge channel Engine(s) Supported OSs
modin-all (recommended) Dask*, Ray*, OmniSci* Linux*
modin-ray (stable backend) Ray* Linux*, Windows*
modin-omnisci (experimental, for best performance) OmniSci* Linux*
modin Dask* Linux*, Windows*, MacOS*
modin-dask Dask* Linux*, Windows*, MacOS*

PyPI Installation

To build Modin from source, view the “PyPI” instructions from the relevant Modin documentation.

Build From Source

To build Modin from source, view the “Build From Source” instructions from the relevant Modin documentation.

Getting Started with Modin: Sanity Check

Once Intel® Distribution of Modin is installed, run the following command(s) to verify that the installation was successful and Intel® Distribution of Modin optimizations are ready to be use.
Run the following command(s) in command line based on the Modin backend engine(s) that you installed:

Ray Engine

python -c "import modin.pandas as pd, modin.config as cfg; cfg.Engine.put('Ray'); df = pd.DataFrame([1]);print(df+1)"

OmniSci Engine

For Modin Versions Before 0.12:

python -c "import modin.experimental.pandas as pd, modin.config as cfg; cfg.Engine.put('Native'); cfg.Backend.put('OmniSci'); df = pd.DataFrame([1]);print(df+1)"

For Modin Versions 0.12+:

python -c "import modin.pandas as pd, modin.config as cfg; cfg.StorageFormat.put('OmniSci'); df = pd.DataFrame([1]);print(df+1)"

Dask Engine

python -c "import modin.pandas as pd, modin.config as cfg; cfg.Engine.put('Dask'); df = pd.DataFrame([1]);print(df+1)"

Check Sanity Check Results

For each command, if Intel® Modin is properly installed, the following dataframe will be printed:

   0
0 2

Configuring the Compute Engine

Once Intel® Distribution of Modin is installed, you can run the following command(s) to set Intel® Distribution of Modin to use the desired compute engine for your workload for distributing and optimizing Pandas API functions.

Ray Engine

There are a few ways to enable the Ray backend in Intel® Distribution of Modin:

For Modin Versions Before 0.12:

  • In your Python script with a few lines of code:
    import modin.pandas as pd
    import modin.config as cfg
    cfg.Engine.put('Ray’)
  • Setting the following environment variables:
    export MODIN_ENGINE=native
    export MODIN_BACKEND=ray

​For Modin Versions After 0.12:

  • In your Python script with a few lines of code:
    import modin.config as cfg
    cfg.StorageFormat.put(‘ray’)
    import modin.pandas as pd
  • Setting the following environment variable:
    export MODIN_STORAGE_FORMAT=ray

OmniSci Engine

There are a few ways to enable the OmniSci* backend in Intel® Distribution of Modin:

For Modin Versions Before 0.12:

  • In your Python script with a few lines of code:
    import modin.config as cfg
    cfg.Engine.put('native')
    cfg.Backend.put('omnisci')
    cfg.IsExperimental.put(True)
    import modin.experimental.pandas as pd
  • Setting the following environment variables:
    export MODIN_ENGINE=native
    export MODIN_BACKEND=omnisci
    export MODIN_EXPERIMENTAL=true

For Modin Versions After 0.12+:

  • In your Python script with a few lines of code:
    import modin.config as cfg
    cfg.StorageFormat.put(‘omnisci’)
    import modin.pandas as pd
  • Setting the following environment variable:
    export MODIN_STORAGE_FORMAT=omnisci

You can find more information on the OmniSci* backend in the relevant Modin documentation.

Dask Engine

There are a few ways to enable the Dask backend in Intel® Distribution of Modin:

  • In your Python script with a few lines of code:
    import modin.pandas as pd
    import modin.config as cfg
    cfg.Engine.put('Dask’)
  • Setting the following environment variables:
    export MODIN_ENGINE=native
    export MODIN_BACKEND=dask

For Modin Versions After 0.12:

  • In your Python script with a few lines of code:
    import modin.config as cfg
    cfg.StorageFormat.put(‘dask’)
    import modin.pandas as pd
  • Setting the following environment variable:
    export MODIN_STORAGE_FORMAT=dask

If you have only installed a single compute engine with Intel® Distribution of Modin, Modin will use that as the default engine and you can skip this step.

Support

If you have further questions or need support on your workload optimization, submit your queries to the Intel® AI Analytics Toolkit Forum or Modin GitHub, on the Issues page depending on the type of support required.

Useful Resources

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at Performance Index.