Measuring Performance with
Intel® MKL Support Functions
Intel MKL provides functions to measure performance. This provides a way
of quantifying the performance improvement resulting from using Intel MKL
routines in this tutorial.
Measure Performance of dgemm
Use the
routine to return the
elapsed CPU time in seconds.
dsecnd
The quick execution of the
routine makes it difficult
to measure its speed, even for an operation on a large matrix. For this reason,
the exercises perform the multiplication multiple times. You should set the
value of the
dgemm
LOOP_COUNT
constant so that the total execution time
is about one second.
/* C source code is found in dgemm_with_timing.c */ printf (" Making the first run of matrix product using Intel(R) MKL dgemm function \n" " via CBLAS interface to get stable run time measurements \n\n"); cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, A, k, B, n, beta, C, n); printf (" Measuring performance of matrix product using Intel(R) MKL dgemm function \n" " via CBLAS interface \n\n"); s_initial = dsecnd(); for (r = 0; r < LOOP_COUNT; r++) { cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, A, k, B, n, beta, C, n); } s_elapsed = (dsecnd() - s_initial) / LOOP_COUNT; printf (" == Matrix multiplication using Intel(R) MKL dgemm completed == \n" " == at %.5f milliseconds == \n\n", (s_elapsed * 1000));
Measure Performance Without Using dgemm
In order to show the improvement resulting from using
, perform the same
measurement, but use a triply-nested loop to multiply the matrices.
dgemm
/* C source code is found in matrix_multiplication.c */ printf (" Making the first run of matrix product using triple nested loop\n" " to get stable run time measurements \n\n"); for (i = 0; i < m; i++) { for (j = 0; j < n; j++) { sum = 0.0; for (l = 0; l < k; l++) sum += A[k*i+l] * B[n*l+j]; C[n*i+j] = sum; } } printf (" Measuring performance of matrix product using triple nested loop \n\n"); s_initial = dsecnd(); for (r = 0; r < LOOP_COUNT; r++) { for (i = 0; i < m; i++) { for (j = 0; j < n; j++) { sum = 0.0; for (l = 0; l < k; l++) sum += A[k*i+l] * B[n*l+j]; C[n*i+j] = sum; } } } s_elapsed = (dsecnd() - s_initial) / LOOP_COUNT; printf (" == Matrix multiplication using triple nested loop completed == \n" " == at %.5f milliseconds == \n\n", (s_elapsed * 1000));
Compare the results in the first exercise using
to the results of the second
exercise without using
.
dgemm
dgemm
You can find more information about measuring Intel MKL performance
from the article "A simple example to measure the performance of an Intel MKL
function" in the Intel Math Kernel Library Knowledge Base.
Optimization Notice
|
---|
Intel's compilers may or may not optimize to the same degree
for non-Intel microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction
sets and other optimizations. Intel does not guarantee the availability,
functionality, or effectiveness of any optimization on microprocessors not
manufactured by Intel. Microprocessor-dependent optimizations in this product
are intended for use with Intel microprocessors. Certain optimizations not
specific to Intel microarchitecture are reserved for Intel microprocessors.
Please refer to the applicable product User and Reference Guides for more
information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
|