Contents

# Measuring Effect of Threading on dgemm

By default, Intel MKL uses
n
n
is the number of physical cores on the system. By restricting the number of threads and measuring the change in performance of
dgemm
, this exercise shows how threading impacts performance.

## Limit the Number of Cores Used for dgemm

This exercise uses the
routine to override the default number of threads, and
to determine the maximum number of threads.
```/* C source code is found in dgemm_threading_effect_example.c */

printf (" Finding max number of threads Intel(R) MKL can use for parallel runs \n\n");

printf (" Running Intel(R) MKL from 1 to %i threads \n\n", max_threads);
for (i = 1; i <= max_threads; i++) {
for (j = 0; j < (m*n); j++)
C[j] = 0.0;

printf (" Requesting Intel(R) MKL to use %i thread(s) \n\n", i);

printf (" Making the first run of matrix product using Intel(R) MKL dgemm function \n"
" via CBLAS interface to get stable run time measurements \n\n");
cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
m, n, k, alpha, A, k, B, n, beta, C, n);

printf (" Measuring performance of matrix product using Intel(R) MKL dgemm function \n"
" via CBLAS interface on %i thread(s) \n\n", i);
s_initial = dsecnd();
for (r = 0; r < LOOP_COUNT; r++) {
cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
m, n, k, alpha, A, k, B, n, beta, C, n);
}
s_elapsed = (dsecnd() - s_initial) / LOOP_COUNT;

printf (" == Matrix multiplication using Intel(R) MKL dgemm completed ==\n"
" == at %.5f milliseconds using %d thread(s) ==\n\n", (s_elapsed * 1000), i);
}
```
Examine the results shown and notice that time to multiply the matrices decreases as the number of threads increases. If you try to run this exercise with more than the number of threads returned by