Measuring Effect of Threading on dgemm

Using the Intel® oneAPI Math Kernel Library for Matrix Multiplication - Fortran

Download PDF

ID 758508

Date 9/27/2021

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Measuring Effect of Threading on dgemm

By default, oneMKL uses n threads, where n is the number of physical cores on the system. By restricting the number of threads and measuring the change in performance of dgemm, this exercise shows how threading impacts performance.

Limit the Number of Cores Used for dgemm

This exercise uses the mkl_set_num_threads routine to override the default number of threads, and mkl_get_max_threads to determine the maximum number of threads.

*      Fortran source code is found in dgemm_threading_effect_example.f

      PRINT *, "Finding max number of threads Intel(R) MKL can use for"
      PRINT *, "parallel runs"
      PRINT *, ""
      MAX_THREADS = MKL_GET_MAX_THREADS()

      PRINT 20," Running Intel(R) MKL from 1 to ",MAX_THREADS," threads"
 20   FORMAT(A,I2,A)
      PRINT *, ""
      DO L = 1, MAX_THREADS
        DO I = 1, M
          DO J = 1, N
            C(I,J) = 0.0
          ENDDO
        ENDDO

        PRINT 30, " Requesting Intel(R) MKL to use ",L," thread(s)"
 30     FORMAT(A,I2,A)
        CALL MKL_SET_NUM_THREADS(L)

        PRINT *, "Making the first run of matrix product using "
        PRINT *, "Intel(R) MKL DGEMM subroutine to get stable "
        PRINT *, "run time measurements"
        PRINT *, ""
        CALL DGEMM('N','N',M,N,K,ALPHA,A,M,B,K,BETA,C,M)

        PRINT *, "Measuring performance of matrix product using "
        PRINT 40, " Intel(R) MKL DGEMM subroutine on ",L," thread(s)"
 40     FORMAT(A,I2,A)
        PRINT *, ""
        S_INITIAL = DSECND()
        DO R = 1, LOOP_COUNT
          CALL DGEMM('N','N',M,N,K,ALPHA,A,M,B,K,BETA,C,M)
        END DO
        S_ELAPSED = (DSECND() - S_INITIAL) / LOOP_COUNT

        PRINT *, "== Matrix multiplication using Intel(R) MKL DGEMM =="
        PRINT 50, " == completed at ",S_ELAPSED*1000," milliseconds =="
        PRINT 60, " == using ",L," thread(s) =="
 50     FORMAT(A,F12.5,A)
 60     FORMAT(A,I2,A)
        PRINT *, ""
      END DO

Examine the results shown and notice that time to multiply the matrices decreases as the number of threads increases. If you try to run this exercise with more than the number of threads returned by mkl_get_max_threads, you might see performance degrade when you use more threads than physical cores.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Using the Intel® oneAPI Math Kernel Library for Matrix Multiplication - Fortran

Measuring Effect of Threading on dgemm

Limit the Number of Cores Used for dgemm

See Also