Gain Expert Insights into Python* Parallelism Techniques
Gain Expert Insights into Python* Parallelism Techniques
Subscribe Now
Stay in the know on all things CODE. Updates are delivered to your inbox.
Overview
Unlock top performance for Intel® CPUs and GPUs using Python*, including how to use parallel Python to write reductions and offload them to a SYCL* device.
A reduction combines multiple values in parallel, in an unspecified order, to produce a single value. This technique uses an operator that is both associative and commutative.
The expert-level hands-on agenda includes these topics:
- Introduction to numba-dpex (ND) and examples of how to write parallel code and perform an automatic offload approach using the @numba.jit decorator and kernel decorator.
- Introduction to ND-range kernels, workgroups, and work items.
- How to write data parallel Python code using shared local memory, private memory, barriers, and atomics.
- Write a data parallel Python program to perform reductions:
- In a single kernel
- Using shared local memory and barriers
Achieve near-native code performance with this set of essential packages optimized for high-performance numerical and scientific computing.
Related On-Demand Webinar
Related Article