Developer Guide

Contents

Call Description Line

Call Description Line for CPU

In
Intel® oneAPI Math Kernel Library
Verbose mode, each verbose-enabled function called from your application prints a call description line. The line begins with the
MKL_VERBOSE
character string and uses spaces as delimiters. The format of the rest of the line is subject to change in a future release.
The following table lists information contained in a call description line for Verbose with CPU applications and provides available links for more information:
Information
Description
Related Links
The name of the function.
Although the name printed may differ from the name used in the source code of the application (for example, the
cblas_
prefix of CBLAS functions is not printed), you can easily recognize the function by the printed name.
Values of the arguments.
  • The values are listed in the order of the formal argument list. The list directly follows the function name, it is parenthesized and comma-separated.
  • Arrays are printed as addresses (to see the alignment of the data).
  • Integer scalar parameters passed by reference are printed by value. Zero values are printed for
    NULL
    references.
  • Character values are printed without quotes.
  • For all parameters passed by reference, the values printed are the values
    returned by the function
    . For example, the printed value of the
    info
    parameter of a LAPACK function is its value after the function execution.
  • For verbose-enabled functions in the ScaLAPACK domain, in addition to the standard input parameters, information about blocking factors, MPI rank, and process grid is also printed.
Time taken by the function.
  • The time is printed in convenient units (seconds, milliseconds, and so on), which are explicitly indicated.
  • The time may fluctuate from run to run.
  • The time printed may occasionally be larger than the time actually taken by the function call, especially for small problem sizes and multi-socket machines.
    To reduce this effect, bind threads that call
    Intel® oneAPI Math Kernel Library
    to CPU cores by setting an affinity mask.
Managing Multi-core Performance for options to set an affinity mask.
Value of the
MKL_CBWR
environment variable.
The value printed is prefixed with
CNR:
Value of the
MKL_DYNAMIC
environment variable.
The value printed is prefixed with
Dyn:
Status of the
Intel® oneAPI Math Kernel Library
memory manager.
The value printed is prefixed with
FastMM:
Avoiding Memory Leaks in
oneMKL
for a description of the
Intel® oneAPI Math Kernel Library
memory manager
OpenMP* thread number of the calling thread.
The value printed is prefixed with
TID:
Values of
Intel® oneAPI Math Kernel Library
environment variables defining the general and domain-specific numbers of threads, separated by a comma.
The first value printed is prefixed with
NThr:
The following is an example of a call description line (with OpenMP threading):
MKL_VERBOSE DGEMM(n,n,1000,1000,240,0x7ffff708bb30,0x7ff2aea4c000,1000,0x7ff28e92b000,240,0x7ffff708bb38,0x7ff28e08d000,1000) 1.66ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:16
The following is an example of a call description line (with TBB threading):
MKL_VERBOSE DGEMM(n,n,1000,1000,240,0x7ffff708bb30,0x7ff2aea4c000,1000,0x7ff28e92b000,240,0x7ffff708bb38,0x7ff28e08d000,1000) 1.66ms CNR:OFF Dyn:1 FastMM:1
For more information about selected threading, refer to Version Information Line.
The following information is not printed because of limitations of
Intel® oneAPI Math Kernel Library
Verbose mode:
  • Input values of parameters passed by reference if the values were changed by the function.
    For example, if a LAPACK function is called with a workspace query, that is, the value of the
    lwork
    parameter equals -1 on input, the call description line prints the result of the query and not -1.
  • Return values of functions.
    For example, the value returned by the function
    ilaenv
    is not printed.
  • Floating-point scalars passed by reference.

Call Description Line for GPU

In
Intel® oneAPI Math Kernel Library
Verbose mode, each verbose-enabled function called from your application prints a call description line. The line begins with the
MKL_VERBOSE
character string and uses spaces as delimiters. The format of the rest of the line may change in a future release.
The following table lists information contained in a call description line for verbose with GPU applications.
Information
Description
The name of the function
Although the name printed may differ from the name used in the source code of the application, you can easily recognize the function by the printed name.
The values of the arguments
  • The values are listed in the order of the formal argument list. The list directly follows the function name, and it is parenthesized and comma-separated.
  • Arrays are printed as addresses (to show the alignment of the data).
  • Integer scalar parameters passed by reference are printed by value. Zero values are printed for NULL references.
  • Character values are printed without quotation marks.
  • For all parameters passed by reference, the values printed are the values returned by the function.
Time taken by the function
  • If verbose is enabled with timing for GPU applications, kernel executions will become synchronous (previous kernel will block later kernels) and the measured time may include potential data transfers and/or data copies in host and devices.
  • If Verbose is enabled without timing for GPU applications, time will be printed out as 0.
  • The time is printed in convenient units (seconds, milliseconds, and so on), which are explicitly indicated.
  • The time may fluctuate from run to run.
  • The time printed may occasionally be larger than the time actually taken by the function call, especially for small problem sizes.
Device index
The index of the GPU device on which the kernel is being executed will be printed after the character string "GPU" (e.g. GPU0, GPU1, GPU2, etc). Use the index and refer to the GPU information lines for more information about the specific device.
If the kernel is executed on the host CPU, this field will be empty.
The following is an example of a call description line:
MKL_VERBOSE FFT(dcfi64) 224.30us GPU0
For GPU applications, the call description lines may be printed out-of-order (the order of the call description lines printed in the verbose output may not be the order in which the kernels are submitted in the functions) for the following two cases:
  • Verbose is enabled without timing and the kernel executions stay asynchronous.
  • The kernel is not executed on one of the GPU devices, but on the host CPU (the device index will not be printed in this case).

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.