Python* is a high-level, general-purpose programming language that is used extensively in data science and AI/machine learning domains. The simplicity and readability of Python have made it the fastest growing programming language in recent years.
One of its few limitations is being an interpreted language, which leads to less than efficient results in production environments that demand high performance. The Intel® Distribution for Python* addresses this issue; it eliminates the need to rewrite the model in a more performant language by enabling efficient deployment in Python. This blog introduces you to the Intel distribution and highlights a code sample using it.
Intel Distribution for Python is a high-performance binary distribution of commonly used core Python numerical, scientific, and machine learning packages. It helps developers achieve near-native code performance for computationally data-intensive domains including HPC, big data, and data science. The packages have been optimized using the Intel® oneAPI Math Kernel Library (oneMKL) and Intel® oneAPI Data Analytics Library (oneDAL) to take advantage of parallelism through vectorization, multithreading, multiprocessing, and optimized multi-node communication techniques.
Researchers and developers can scale compute-intensive Python code from laptops to powerful servers using optimized NumPy, SciPy, and Numba*, and tune for highest efficiency at scale using advanced tools for multithreading and multiprocessing with OpenMP*.
Install Intel® Distribution for Python*
You can download the tool in multiple ways:
- As part of the Intel® AI Analytics Toolkit (AI Kit)
- As a stand-alone version
- With Anaconda* package manager
- Using the YUM repository
- Using the APT repository
After successfully installing Intel Distribution for Python, it needs to be activated. The following are the steps for the stand-alone version:
Note The following instructions were evaluated for Windows® 10 and Microsoft Visual Studio* VS Code* v1.67.2 IDE.
- In the command prompt, go to the Python directory of IDP. For instance,
C:\Program Files (x86)\Intel\oneAPI\intelpython\python3.9
- Then, perform the activation by going to the scripts folder as:
C:\Program Files (x86)\Intel\oneAPI\intelpython\python3.9>scripts\activate
- Open the VS Code IDE as:
C:\Program Files (x86)\Intel\oneAPI\intelpython\python3.9>code.
You can now verify the installation in the VS Code terminal using python --version command. The output will show the installed Intel Distribution for Python version: Python 3.9.10 :: Intel Corporation.
Learn how to activate Intel Distribution for Python for other installation options.
Intel Distribution for Python is compatible with the following hardware and software specifications:
- Operating system: Windows 10, Windows* 11, Linux*, macOS*
- Package manager: PIP*, conda*
- Programming language: Python v3.9
- IDE: PyCharm*, Microsoft Visual Studio, VS Code
If you are using Intel Distribution for Python as a part of the Intel® AI Analytics Toolkit (AI Kit), check the AI Kit system requirements.
Accelerate NumPy and Numba* with Intel Distribution for Python
NumPy is a popular Python library for a wide range of computations such as mathematical functions, random number generation, and Fourier transformations, whereas Numba is a just-in-time (JIT) compiler that makes it possible for NumPy code to execute at the speed of native machine code. Intel Distribution for Python has optimized Numba, enabling multicore execution and Single Instruction Multiple Data (SIMD) features to achieve parallelism on Intel hardware architectures.
Install the Software
NumPy and Numba are available in Intel Distribution for Python as part of the AI Kit. Additionally, you can install each as a stand-alone version:
- Install NumPy
- Numba data-parallel extension (numba-dpex) is an Intel-developed extension to the Numba JIT compiler. It adds kernel programming and automatic offload capabilities to the Numba compiler.
Basic Linear Algebra Subprogram (BLAS) and Linear Algebra Package (LAPACK) computations are optimized using oneMKL instead of Automatically Tuned Linear Algebra Software (ATLAS) or OpenBLAS libraries.
Universal Math Functions
Optimizations include universal math functions available as instances of the numpy.ufunc class, such as numpy.sinh(), numpy.median(), and numpy.bitwise_xor().
These universal functions allow application of the same transformation to each element of the input array.
Random Number Generation
NumPy installed using Intel Distribution for Python enables faster random number generation through the mkl.random package, powered by oneMKL. The numpy.random_intel module of this package serves as a drop-in replacement for the numpy.random module.
Fast Fourier Transforms (FFT)
Intel Distribution for Python also enables accelerated FFT computations by exposing a Python interface to the FFT functionality of oneMKL. This interface offers full support to in-place, out-of-place, and multidimensional transformations of NumPy arrays.
We have created a k-nearest neighbors (KNN) code sample that demonstrates the implementation of the KNN algorithm and how to achieve the same accuracy using three different Intel Distribution for Python libraries: Numpy, Numba, and numba_dpex.
We used the wine dataset, which is available in the scikit-learn* library. Here, the DataFrame will be limited to target and two features (malic_acid and alcohol) using the following code.
from sklearn.datasets import load_wine data = load_wine() # Convert loaded dataset to DataFrame df = pd.DataFrame(data=data.data,columns=data.feature_names) df['target'] = pd.Series(data.target) # Limit features to 2 selected for this problem df = df[['target', 'alcohol', 'malic_acid']]
The next step is to prepare the dataset for training and testing. So, the downloaded wine dataset is divided into a training dataset (containing 90% of the data) and a test dataset (containing 10% of the data). Additionally, the training dataset and testing dataset are divided into X (features) and y (target). This can be done by using the below code.
# we are using 10% of the data for the testing purpose train_sample_idx = np.random.choice(df.index, size=int(df.shape*0.9), replace=False) train_data, test_data = df.iloc[train_sample_idx], df.drop(train_sample_idx) # get features and label from train/test data X_train, y_train = train_data.drop('target', axis=1), train_data['target'] X_test, y_test = test_data.drop('target', axis=1), test_data['target']
The last step is to implement the KNN algorithm using numpy, numba, and numba-dpex. For numba implementation, numba.jit() decorator is used and numba_dpex implementation uses numba_dpex.kernel() decorator.
Run the Intel Distribution for Python for NumPy vs numba_dpex sample yourself on the Linux environment and Intel® Developer Cloud to see how you can seamlessly leverage high-performance Python packages.
Although the Intel Distribution for Python significantly accelerates the computation power of Python libraries for complex operations as highlighted above, there are two key limitations:
- It uses Intel® oneAPI DPC++/C++ Compiler, whereas the community versions of NumPy and SciPy that are installed using PIP package manager are compiled using GNU Compiler Collection (GCC)*. The difference in compilers used can sometimes adversely affect the performance of the libraries.
- For simple operations such as universal math functions applied on a small-sized array, the libraries installed by Intel Distribution for Python may not run faster than those installed using package managers like PIP.
The next step is to incorporate Intel Distribution for Python into your AI workflow and get enhanced performance of the state-of-the-art Python libraries. Moreover, feel free to contribute to the Intel Distribution for Python project on GitHub. We encourage you to learn more about and incorporate Intel’s other AI/machine learning framework optimizations and end-to-end portfolio of tools into your AI workflow. Also, visit AI and machine learning page to learn about Intel’s AI software development resources to prepare, build, deploy and scale AI solutions.