Coding Techniques

Developer Guide

Developer Guide for Intel® oneAPI Math Kernel Library macOS*

Download PDF

ID 766688

Date 7/13/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-8672C441-6165-4E29-A771-CAB2A0F07652

View Details

Coding Techniques

To improve performance, properly align arrays in your code. Additional conditions can improve performance for specific function domains.

Data Alignment and Leading Dimensions

To improve performance of your application that calls Intel® oneAPI Math Kernel Library (oneMKL), align your arrays on 64-byte boundaries and ensure that the leading dimensions of the arrays are divisible by 64/element_size, where element_size is the number of bytes for the matrix elements (4 for single-precision real, 8 for double-precision real and single-precision complex, and 16 for double-precision complex) . For more details, see Example of Data Alignment.

LAPACK Packed Routines

The routines with the names that contain the letters HP, OP, PP, SP, TP, UPin the matrix type and storage position (the second and third letters respectively) operate on the matrices in the packed format (see LAPACK "Routine Naming Conventions" sections in the Intel® oneAPI Math Kernel Library (oneMKL) Developer Reference). Their functionality is strictly equivalent to the functionality of the unpacked routines with the names containing the lettersHE, OR, PO, SY, TR, UN in the same positions, but the performance is significantly lower.

If the memory restriction is not too tight, use an unpacked routine for better performance. In this case, you need to allocate N²/2 more memory than the memory required by a respective packed routine, where N is the problem size (the number of equations).

For example, to speed up solving a symmetric eigenproblem with an expert driver, use the unpacked routine:


            
call dsyevx(jobz, range, uplo, n, a, lda, vl, vu, il, iu, abstol, m, w, z, ldz, work, lwork, iwork, ifail, info)

where a is the dimension lda-by-n, which is at least N² elements,
instead of the packed routine:



call dspevx(jobz, range, uplo, n, ap, vl, vu, il, iu, abstol, m, w, z, ldz, work, iwork, ifail, info)

where ap is the dimension N*(N+1)/2.

Parent topic: Other Tips and Techniques to Improve Performance

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Developer Guide for Intel® oneAPI Math Kernel Library macOS*

Coding Techniques

Data Alignment and Leading Dimensions

LAPACK Packed Routines