A quick getting started guide to use VTune Profiler GUI profiling DPC++ sample codes on Intel oneAPI Toolkits Container
This tutorial guides users on using the oneAPI IoT Toolkit container image on a Linux system to enable using Intel(R) VTune(TM) Profiler to profile a DPC++ program. VTune Profiler is a component in the oneAPI Base Toolkit which is a required toolkit for other domain-specific toolkits like the IoT toolkit. This guide includes detailed steps and tips on using VTune Profiler not covered by current on-line documentation.
Install docker, download and run oneAPI IoT Toolkit docker image
We use ubuntu as an example to show the details steps.
Step 1. Install the docker and some housekeeping commands (run them in root privilege)
#update the software repository apt-get update #Uninstall the previous docker version apt-get remove docker docker-engine docker.io #Install Docker apt install docker.io #start docker as a service systemctl start docker #enable docker service systemctl enable docker
Step 2. download and run oneAPI IoT Toolkit's docker image
#download oneAPI IoT Toolkit's docker image. dockerHub also provides other oneAPI toolkit docker image as well. #note: The Docker image is ~5 GB and can take ~15 minutes to download. It will require 25 GB of disk space image=intel/oneapi-iotkit docker pull "$image" #add local connection access to X server xhost local:root #Run oneAPI IoT Toolkit docker image (warning notes : this command is INSECURE to the host if you consider to expose this docker environment to other end users) docker run --cap-add=SYS_ADMIN --cap-add=SYS_PTRACE --net=host -e DISPLAY --device=/dev/dri -it "$image"
Compile DPC++ sample codes as a profiling workload
Step 3. download oneAPI DPC++ sample codes
#Run oneapi-cli to get DPC++ sample codes /opt/intel/oneapi/dev-utilities/latest/bin/oneapi-cli
Step 4. Choose "create a project -> cpp -> Toolkit -> Intel oneAPI DPC++ -> C++ compiler -> CPU, GPU, FPGA -> Vector Add"
Step 5. Enter "vector-add" sample codes folder which is just created
#compile the vector-add DPC++ code root@be0945baa1de:/opt/intel/oneapi/dev-utilities/latest/bin/vector-add> make
. The console output is shown below if vector-add sample is successfully compiled and excuted.
root@be0945baa1de:/opt/intel/oneapi/dev-utilities/latest/bin/vector-add# ./vector-add-buffers Running on device: Intel(R) Gen9 Vector size: 10000 : 0 + 0 = 0 : 1 + 1 = 2 : 2 + 2 = 4 ... : 9999 + 9999 = 19998 Vector add successfully completed on device.
Launch VTune Profiler GUI to directly profile DPC++ GPU workload
Step 6. Run VTune GUI directly
#Launch VTune Profiler's Graphical User Interface vtune-gui
Configure GPU offload analysis and uncheck memory bandwidth. (This requires the host's debugFS to be mounted in the docker image. Please note this is also an insecure option to the host if considering to expose docker to end-users. We may provide the updates in this article later)
VTune can reveal oneAPI SYCL hotspot kernel function (VectoreAdd for this case) and also oneAPI Level Zero's call flows in VTune's timeline below.
And It's the same as OpenCL kernel profiling capabilities VTune Profiler could provide; VTune Profiler can reveal SYCL kernel implementation details in source assembly level.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.