## Measuring Performance with oneMKL Support Functions

oneMKL provides functions to measure performance. This provides a way of quantifying the performance improvement resulting from using oneMKL routines in this tutorial.

### Measure Performance of dgemm

Use the `dsecnd` routine to return the elapsed CPU time in seconds.

The quick execution of the `dgemm` routine makes it difficult to measure its speed, even for an operation on a large matrix. For this reason, the exercises perform the multiplication multiple times. You should set the value of the `LOOP_COUNT` constant so that the total execution time is about one second.

/* C source code is found in dgemm_with_timing.c */ printf (" Making the first run of matrix product using Intel(R) MKL dgemm function \n" " via CBLAS interface to get stable run time measurements \n\n"); cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, A, k, B, n, beta, C, n); printf (" Measuring performance of matrix product using Intel(R) MKL dgemm function \n" " via CBLAS interface \n\n"); s_initial = dsecnd(); for (r = 0; r < LOOP_COUNT; r++) { cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, A, k, B, n, beta, C, n); } s_elapsed = (dsecnd() - s_initial) / LOOP_COUNT; printf (" == Matrix multiplication using Intel(R) MKL dgemm completed == \n" " == at %.5f milliseconds == \n\n", (s_elapsed * 1000));

### Measure Performance Without Using dgemm

In order to show the improvement resulting from using `dgemm`, perform the same measurement, but use a triply-nested loop to multiply the matrices.

/* C source code is found in matrix_multiplication.c */ printf (" Making the first run of matrix product using triple nested loop\n" " to get stable run time measurements \n\n"); for (i = 0; i < m; i++) { for (j = 0; j < n; j++) { sum = 0.0; for (l = 0; l < k; l++) sum += A[k*i+l] * B[n*l+j]; C[n*i+j] = sum; } } printf (" Measuring performance of matrix product using triple nested loop \n\n"); s_initial = dsecnd(); for (r = 0; r < LOOP_COUNT; r++) { for (i = 0; i < m; i++) { for (j = 0; j < n; j++) { sum = 0.0; for (l = 0; l < k; l++) sum += A[k*i+l] * B[n*l+j]; C[n*i+j] = sum; } } } s_elapsed = (dsecnd() - s_initial) / LOOP_COUNT; printf (" == Matrix multiplication using triple nested loop completed == \n" " == at %.5f milliseconds == \n\n", (s_elapsed * 1000));

Compare the results in the first exercise using `dgemm` to the results of the second exercise without using `dgemm`.

You can find more information about measuring oneMKL performance from the article "A simple example to measure the performance of an oneMKL function" in the Intel® oneAPI Math Kernel Library Knowledge Base.

Product and Performance Information |
---|

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Notice revision #20201201 |