How to Use the Intercept Layer for OpenCL™ Applications
Linux* and OS X*: Linux OS X Build Status | Windows*: Windows Build Status
The Intercept Layer for OpenCL Applications is a tool that can
intercept and modify OpenCL calls for debugging and performance
analysis. Using the Intercept Layer for OpenCL Applications requires
no application or driver modifications.
To operate, the Intercept Layer for OpenCL Applications masquerades as
the OpenCL ICD loader (usually) or as an OpenCL implementation
(rarely) and is loaded when the application intends to load the real
OpenCL ICD loader. As part of the Intercept Layer for OpenCL
Application’s initialization, it loads the real OpenCL ICD loader and
gets function pointers to the real OpenCL entry points. Then, whenever
the application makes an OpenCL call, the call is intercepted and can
be passed through to the real OpenCL with or without changes.
To access the OpenCL Intercept Layer repository:
git clone https://github.com/intel/opencl-intercept-layer
All controls are documented here: https://github.com/intel/opencl-intercept-layer/blob/master/docs/controls.md
- See intercept documentation for information about controls.
To run, use the following setup:
export CLI_OpenCLFileName=/opt/intel/inteloneapi/compiler/latest/linux/lib/libOpenCL.so.1
export LD_LIBRARY_PATH=/home/opencl-intercept-layer/build/intercept:$LD_LIBRARY_PATH
export SYCL_BE=PI_OPENCL
CLI_ReportToStderr=0 CLI_ReportToFile=1 CLI_HostPerformanceTiming=1 CLI_DevicePerformanceTiming=1 CLI_DumpDir=. ./matrix.dpcpp
This will generate a file called
cliintercept_report.txt
. The file
will include the following data and tables shown below.- Total Enqueues: 2
- Total Time (ns): 1604325652
Function Name | Calls | Time (ns) | Time (%) | Average (ns) | Min (ns) | Max (ns) |
---|---|---|---|---|---|---|
clBuildProgram | 1 | 337069812 | 21.01% | 337069812 | 337069812 | 337069812 |
clCreateBuffer | 3 | 3393909 | 0.21% | 1131303 | 140325 | 2036170 |
clCreateCommandQueue WithProperties | 1 | 5221 | 0.00% | 5221 | 5221 | 5221 |
clCreateContext | 1 | 33639 | 0.00% | 33639 | 33639 | 33639 |
clCreateKernel | 1 | 11713 | 0.00% | 11713 | 11713 | 11713 |
clCreateProgramWithIL | 1 | 153337 | 0.01% | 153337 | 153337 | 153337 |
clEnqueueNDRangeKernel ( _ZTS9Matrix1_2IfE ) | 3 | 3102488 | 0.19% | 3102488 | 3102488 | 3102488 |
clEnqueueReadBufferRect | 1 | 1099684 | 0.07% | 1099684 | 1099684 | 1099684 |
clGetContextInfo | 8 | 4720 | 0.00% | 590 | 160 | 1997 |
clGetDeviceIDs | 12 | 53004 | 0.00% | 4417 | 504 | 14853 |
clGetDeviceInfo | 30 | 85695 | 0.01% | 2856 | 133 | 19920 |
clGetExtensionFunction AddressForPlatform | 3 | 6446 | 0.00% | 2148 | 1317 | 3687 |
clGetKernelInfo | 2 | 716 | 0.00% | 358 | 169 | 547 |
clGetPlatformIDs | 2 | 1198290216 | 74.69% | 599145108 | 715 | 1198289501 |
clGetPlatformInfo | 12 | 22538 | 0.00% | 1878 | 404 | 7326 |
clReleaseCommandQueue | 1 | 1744 | 0.00% | 1744 | 1744 | 1744 |
clReleaseContext | 1 | 331 | 0.00% | 331 | 331 | 331 |
clReleaseDevice | 6 | 6365 | 0.00% | 1060 | 491 | 1352 |
clReleaseEvent | 2 | 2398 | 0.00% | 1199 | 992 | 1406 |
clReleaseKernel | 1 | 2733 | 0.00% | 2733 | 2733 | 2733 |
clReleaseMemObject | 3 | 45464 | 0.00% | 15154 | 10828 | 22428 |
clReleaseProgram | 1 | 51380 | 0.00% | 51380 | 51380 | 51380 |
clRetainDevice | 6 | 8680 | 0.00% | 1446 | 832 | 2131 |
clSetKernelArg | 20 | 6976 | 0.00% | 348 | 180 | 1484 |
clSetKernelExecInfo | 3 | 1588 | 0.00% | 529 | 183 | 1149 |
clWaitForEvents | 6 | 60864855 | 3.79% | 10144142 | 928 | 60855555 |
Function Name | Calls | Time (ns) | Time (%) | Average (ns) | Min (ns) | Max (ns) |
---|---|---|---|---|---|---|
_ZTS9Matrix1_2IfE | 1 | 58691515 | 99.98% | 58691515 | 58691515 | 58691515 |
clEnqueueReadBufferRect | 1 | 13390 | 0.02% | 13390 | 13390 | 13390 |
The report includes detailed timing data on both your host and device.