Intel® Data Center GPU Max Series Solutions Guide

Application Development

This document is a guide for helping developers to quickly develop and run applications on Intel® Data Center GPU Max Series. Four types of common applications are addressed in this development approach named SYCL*, CUDA*, C/Fortran, and Python AI:

SYCL: refers to a user developing a SYCL application from scratch for GPU
CUDA: refers to a user having an existing CUDA application and wanting to migrate this application to SYCL code that offloads the computation to GPU
C++/Fortran: refers to a user having the C++/Fortran application running on CPU and wanting to extend this application to offload the OpenMP region to GPU
Python AI: refers to a user having a Python AI framework workload running on the CPU and wanting to extend this application to offload the training/inference part to GPU.

Depending on the type of development approach, this guide refers users to the appropriate documents for quickly developing the application for Intel Data Center GPU Max Series. The flow chart below shows the general approach used in this guide.

Install the required software toolkits.
Get familiar with the toolkits by:
- Ensuring Intel Data Center GPU Max Series works correctly.
- Users can download, compile and run a code sample on Intel Data Center GPU Max Series.
Introduces user to build and run the application depending on the application types (SYSCL, CUDA, C/Fortran, Python AI).
The users use Intel tools to identify opportunities to improve code performance. If there is any, then go to Step 3 to implement the code change.

Flowchart

Step 1: Install the Software

The first step is to install the Intel oneAPI toolkits. For SYSCL, CUDA, and C/Fortran application types, you need to download the Intel® oneAPI Base Toolkit and Intel® HPC Toolkit. The Intel oneAPI Base Toolkit is designed for general computing applications, and the Intel® HPC Toolkit Toolkit is designed for High-Performance computing. These toolkits provide the tools developers need to build, analyze, optimize and scale HPC applications.

If you work with Python AI workload, you need to download the Intel oneAPI Base Toolkit and Intel oneAPI Analytics Toolkit. The Intel oneAPI Analytics Toolkit allows developers Python* tools and frameworks to accelerate end-to-end data science and analytics pipelines on Intel® architecture.

Development Approach	Step 1: Install the Software
SYCL CUDA C++/Fortran	• Install Intel oneAPI Base Toolkit • Install Intel oneAPI HPC Toolkit
Python AI	• Install Intel oneAPI Base Toolkit • Install Intel oneAPI AI Analytics Toolkit

Resources:

oneAPI: A New Era of Heterogeneous Computing
SYCL Reference

Step 2: Download and Run Code Sample

In this step, you should familiarize yourself with the Intel Data Center GPU Max Series by downloading and running a code sample. At the end of the step, if the code sample runs correctly, your Intel Data Center GPU Max Series is most likely working properly. You can download the Intel oneAPI code sample from

$ git clone https://github.com/oneapi-src/oneAPI-samples.git

To compile and run the code sample, you must activate the oneAPI environment variables:

$ source /opt/intel/oneapi/setvars.sh

Running a SYCL code sample:

$ cd oneAPI-samples/DirectProgramming/C++SYCL/DenseLinearAlgebra/matrix_mul

$ make all

$ make run

$ ./matrix_mul_dpc

Device: Intel® Graphics [0x0bd5]

Problem size: c(150,600) = a(150,300) * b(300,600)

Result of matrix multiplication using SYCL: Success - The results are correct!

Running a CUDA code sample:

$ cd oneAPI-samples/DirectProgramming/C++SYCL/guided_concurrentKernels_SYCLMigration

$ mkdir build 

$ cd build

$ cmake ..

$ make

Running a C++/Fortran code sample:

$ cd oneAPI-samples/DirectProgramming/C++/CompilerInfrastructure/OpenMP_Offload_Features

$ mkdir build 

$ cd build

$ cmake ..

$ make

$ src/prog4

SYCL: Running on Intel® Graphics [0x0bd5]

SYCL and OMP memory: passed

OMP and OMP memory:  passed

OMP and SYCL memory: passed

SYCL and SYCL memory: passed

Running a AI Python code sample:

For example, to run a Pytorch workload on Intel Data Center GPU Max Series, you first activate the pytorch-gpu conda env available inside oneAPI AI kit.

$ conda activate pytorch-gpu

And run some code sapmple at https://intel.github.io/intel-extension-for-pytorch/xpu/1.13.10+xpu/tutorials/examples.html

Development Approach	Step 2: Download /Run Code Sample
SYCL CUDA C++/Fortran	• Download oneAPI Code Sample • Set oneAPI environment variables • Compile and run the code sample accordingly
Python AI	• Download code sample from https://intel.github.io/intel-extension-for-pytorch/xpu/1.13.10+xpu/tutorials/examples.html • Set oneAPI environment variables • Activate the corresponding conda environment • Run the python code sample

Resources:

Step 3A: Build Application

The Intel oneAPI Samples provides a good resource for writing SYCL code to offload the computation to Intel GPU. In addition, please refer to SYCL documentation.

For the CUDA development approach, migrate your CUDA code with the Intel® DPC++ Compatibility Tool, which is part of the oneAPI Base Toolkit. Then finish writing the SYCL code. Finally, refer to the SYCL book.

For the C++/Fortran development approach, OpenMP* offload constructs are a set of directives for C++ and Fortran that allow you to offload data and execution to target accelerators such as GPUs. The constructs are supported in the Intel oneAPI HPC Toolkit with the Intel® C++ Compiler and the Intel® Fortran Compiler.

Refer to the Intel oneAPI GPU Optimization Guide for GPU offload tips and instructions.

You may use the Intel Advisor tool to analyze your code and identify the best opportunities for GPU offload. The Offload Modeling feature provides performance speedup projections, estimates offload overhead and pinpoints performance bottlenecks. Offload Modeling enables you to improve ROI by modeling different hardware solutions to maximize performance.

For Python AI development approach, you can implement code for GPU offload using Intel® Extension for PyTorch* GPU or TensorFlow 2.10 or above, Intel Extension for TensorFlow. Set some env variables if necessary. Refer to Intel® Extension for PyTorch or Intel® Extension for TensorFlow* to offload the computation to Intel GPU.

Development Approach	Step 3A: Build Application
SYCL	• Implement SYCL code • Run Offload Modeling Analysis in Intel Advisor to analyze your code
CUDA	• Use the Intel DPC++ Compatibility Tool to migrate most of your code • Inline comments help you finish writing and tuning your code • Run Offload Modeling Analysis in Intel Advisor to analyze your code
C++/Fortran	• Implement OpenMP offload onto the GPU • Run Offload Modeling Analysis in Intel Advisor to analyze your code
Python AI	• For the PyTorch framework, implement GPU offload by using Intel Extension for PyTorch* GPU • For the TensorFlow framework, install stock TensorFlow 2.10 or above, and Intel Extension for TensorFlow*

Resources:

Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference
Data Parallel C++, by James Reinders et al.
Intel oneAPI GPU Optimization Guide
See the Implement GPU Offload section here.
OpenMP Offloading Tunning Guide
Training on Migrate from CUDA* to C++ with SYCL*
Intel Extension for PyTorch*
Intel Extension for PyTorch* Documentation
Intel Extension for TensorFlow* - Quick Get Started
Intel Extension for TensorFlow*

Step 3B: Run Application

If you haven’t done setting environment variables in Step 2, you must set up the oneAPI development environment:

$ source /opt/<path to oneAPI installation>/setvars.sh

For SYCL, CUDA, and C++/Fortran development approach, compile your application with icpx/ifx (with option -fsycl for SYCL and CUDA). Refer to Intel oneAPI DPC++/C++ Compiler Developer Guide and Reference for more information. For CUDA application, you can migrate your code with the Intel DPC++ Compatibility Tool. Finally, you can also debug your application using Intel® Distribution for GDB*.

For the Python AI development approach, you can run your application using Intel® Distribution for Python*, which is part of the Intel® AI Analytics Toolkit.

Development Approach	Step 3B: Run Application
SYCL CUDA C++/Fortran	• Set oneAPI environment variables • For CUDA application, migrate code with the Intel DPC++ Compatibility tool and finish writing SYCL code • Compile application with icpx/ifx (option -fsycl for SYCL and CUDA) • Optionally, debug your code with Intel® Distribution for GDB*
Python AI	• Set oneAPI environment variables • Run your code with Intel® Distribution for Python*

Resources:

Intel oneAPI Programming Guide
Intel oneAPI DPC++/C++ Compiler Developer Guide and Reference
A debug-oriented user study includes several samples illustrating how to use oneAPI Debug tools to find real-world bugs.
Intel® Distribution for GDB*

Step 4: Revew Application Performance

Analyze your application by running Roofline Analysis in Intel® Advisor, and GPU Offload analysis in Intel® VTune™ Profiler. Refer to the oneAPI GPU Optimization Guide to improve the performance. Then implement the change by going back to step 3.

Development Approach	Step 4: Review Application Performance
SYCL CUDA C++/Fortran Python AI	• Run Roofline Analysis in the Intel Advisor tool • Run the GPU Offload analysis, GPU Compute/Media Hotspots analysis in Intel® VTune™ Profiler

Resources:

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® Data Center GPU Max Series Solutions Guide

Browse All Tools for the Intel® Data Center GPU Max Series

Application Development

Step 1: Install the Software

Step 2: Download and Run Code Sample

Running a SYCL code sample:

Running a CUDA code sample:

Running a C++/Fortran code sample:

Running a AI Python code sample:

Step 3A: Build Application

Step 3B: Run Application

Step 4: Revew Application Performance

Product and Performance Information