Skip To Main Content
Intel logo - Return to the home page
My Tools

Select Your Language

  • Bahasa Indonesia
  • Deutsch
  • English
  • Español
  • Français
  • Português
  • Tiếng Việt
  • ไทย
  • 한국어
  • 日本語
  • 简体中文
  • 繁體中文
Sign In to access restricted content

Using Intel.com Search

You can easily search the entire Intel.com site in several ways.

  • Brand Name: Core i9
  • Document Number: 123456
  • Code Name: Emerald Rapids
  • Special Operators: “Ice Lake”, Ice AND Lake, Ice OR Lake, Ice*

Quick Links

You can also try the quick links below to see results for most popular searches.

  • Product Information
  • Support
  • Drivers & Software

Recent Searches

Sign In to access restricted content

Advanced Search

Only search in

Sign in to access restricted content.

The browser version you are using is not recommended for this site.
Please consider upgrading to the latest version of your browser by clicking one of the following links.

  • Safari
  • Chrome
  • Edge
  • Firefox

Title of Video

@IntelDevTools

Subscribe Now

Stay in the know on all things CODE. Updates are delivered to your inbox.

Sign Up

Speed Up pandas Data Processing with Modin*

@IntelDevTools

Subscribe Now

Stay in the know on all things CODE. Updates are delivered to your inbox.

Sign Up

Overview

Preparing and manipulating large AI datasets can be time-consuming when using the popular pandas library. This is because pandas can only run on one CPU core at a time. Modin* is an open source, drop-in replacement for pandas that uses all your available cores to parallelize operations. Simply change your import statement and Modin automatically distributes your workloads, whether on your laptop or in the cloud.

This video walks through an example that shows how to get started with Modin. It also covers when to not use Modin,and how to apply Modin and pandas selectively to get a faster overall turnaround time.

 

Highlights

00:05 Because pandas is limited to using one core at a time, compute-intensive operations could get bottlenecks.

00:16 Modin is an alternative to pandas. Modin is open source and automatically parallelizes DataFrame operations across processor cores.

00:30 Learn how to install and set up Modin.

00:48 Find where to download the dataset and script from this video.

01:05 Watch a demonstration of Modin creating a bottleneck.

01:38 Watch a demonstration on how pandas manages the same process.

02:04 Get more information on when Modin is appropriate for your work.

 

Featured Software

Download Modin as part of AI Tools.

 

Additional Resources

  • How to Parallelize Compute-Intensive pandas Operations with Modin
  • AI and Machine Learning

 

Transcript

Are compute-intensive pandas operations causing bottlenecks in your AI data preparation and manipulation steps? It's probably because pandas is limited to using one core at a time.
 

Modin is a drop-in replacement for pandas that automatically parallelizes DataFrame operations across all available processor cores. It's open source and offers a choice of back-end compute engines.
 

To get started, just install the package with pip or conda. You can specify which back end to install or install them all. Then just change one line of code, import modin.pandas instead of pandas, and all your existing pandas calls will use Modin's parallel processing.

 

The dataset and script for this example are from this article, which you can access with this QR code or the link in the description. Note that you will also need to install NLTK and Intel® Extension for scikit-learn*, which speeds up the prediction operation.
 

I'm running on my Intel® Core™ i7 laptop with eight physical cores with Modin running the group by call using the ray execution engine. Automatically, all the cores are being utilized and jumping ahead to the runtime results.
 

The group by call is about 6x faster overall, but some of the shorter operations actually slowed down and this is because there's upfront preparation in order to parallelize the operation, which outweighs the benefit of parallelization for those short operations.
 

If you want to optimize every step, you can take finer-grain control and mix mode and processing with pandas. Here's the code wherein mixed mode (it uses pandas by default) but then converts from pandas to Modin for the group by operation, then back to pandas for the rest, and as expected, running in mixed mode. The runtime results show that this gives you the best of both worlds.
 

So, for more details about how, when, and when not to use Modin, check out the link below to the full article or go to developer.intel.com/modin to learn how to get started.

  • Company Overview
  • Contact Intel
  • Newsroom
  • Investors
  • Careers
  • Corporate Responsibility
  • Inclusion
  • Public Policy
  • © Intel Corporation
  • Terms of Use
  • *Trademarks
  • Cookies
  • Privacy
  • Supply Chain Transparency
  • Site Map
  • Recycling
  • Your Privacy Choices California Consumer Privacy Act (CCPA) Opt-Out Icon
  • Notice at Collection

Intel technologies may require enabled hardware, software or service activation. // No product or component can be absolutely secure. // Your costs and results may vary. // Performance varies by use, configuration, and other factors. Learn more at intel.com/performanceindex. // See our complete legal Notices and Disclaimers. // Intel is committed to respecting human rights and avoiding causing or contributing to adverse impacts on human rights. See Intel’s Global Human Rights Principles. Intel’s products and software are intended only to be used in applications that do not cause or contribute to adverse impacts on human rights.

Intel Footer Logo