Improving Performance for Small Size Problems

Developer Guide

Developer Guide for Intel® oneAPI Math Kernel Library Linux*

Download PDF

ID 766690

Date 12/16/2022

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Document Table of Contents x

Developer Guide for Intel® oneAPI Math Kernel Library for Linux*

Developer Guide for Intel® oneAPI Math Kernel Library for Linux* x

Getting Help and Support What's New Notational Conventions Related Information Getting Started Structure of the Intel® oneAPI Math Kernel Library Linking Your Application with the Intel® oneAPI Math Kernel Library Managing Performance and Memory Language-specific Usage Options Obtaining Numerically Reproducible Results Coding Tips Managing Output Working with the Intel® oneAPI Math Kernel Library Cluster Software Managing Behavior of the Intel® oneAPI Math Kernel Library with Environment Variables Configuring Your Integrated Development Environment to Link with Intel® oneAPI Math Kernel Library Intel® oneAPI Math Kernel Library Benchmarks Appendix A: Intel® oneAPI Math Kernel Library Language Interfaces Support Appendix B: Support for Third-Party Interfaces Appendix C: Directory Structure in Detail Notices and Disclaimers

Getting Started x

Shared Library Versioning CMake Config for oneMKL Checking Your Installation Setting Environment Variables Compiler Support Using Code Examples What You Need to Know Before You Begin Using the Intel® oneAPI Math Kernel Library

Setting Environment Variables x

Scripts to Set Environment Variables Modulefiles to Set Environment Variables Automating the Process of Setting Environment Variables Using the CMake Config File

Structure of the Intel® oneAPI Math Kernel Library x

Architecture Support High-level Directory Structure Layered Model Concept

Linking Your Application with the Intel® oneAPI Math Kernel Library x

Linking Quick Start Linking Examples Linking in Detail Building Custom Shared Objects

Linking Quick Start x

Using the -qmkl Compiler Option Using the Single Dynamic Library Selecting Libraries to Link with Using the Link-line Advisor Using the Command-Line Link Tool

Linking Examples x

Linking on IA-32 Architecture Systems Linking on Intel(R) 64 Architecture Systems

Linking in Detail x

Listing Libraries on a Link Line Dynamically Selecting the Interface and Threading Layer Linking with Interface Libraries Linking with Threading Libraries Linking with Computational Libraries Linking with Compiler Run-time Libraries Linking with System Libraries

Linking with Interface Libraries x

Using the ILP64 Interface vs. LP64 Interface Linking with Fortran 95 Interface Libraries

Building Custom Shared Objects x

Using the Custom Shared Object Builder Composing a List of Functions Specifying Function Names Distributing Your Custom Shared Object

Managing Performance and Memory x

Improving Performance with Threading Improving Performance for Small Size Problems Other Tips and Techniques to Improve Performance Using Memory Functions

Improving Performance with Threading x

OpenMP* Threaded Functions and Problems Functions Threaded with Intel® Threading Building Blocks Avoiding Conflicts in the Execution Environment Techniques to Set the Number of Threads Setting the Number of Threads Using an OpenMP* Environment Variable Changing the Number of OpenMP* Threads at Run Time Using Additional Threading Control Calling oneMKL Functions from Multi-threaded Applications Using Intel® Hyper-Threading Technology Managing Multi-core Performance Managing Performance with Heterogeneous Cores

Using Additional Threading Control x

oneMKL-specific Environment Variables for OpenMP Threading Control MKL_DYNAMIC MKL_DOMAIN_NUM_THREADS MKL_NUM_STRIPES Setting the Environment Variables for Threading Control

Improving Performance for Small Size Problems x

Using MKL_DIRECT_CALL in C Applications Using MKL_DIRECT_CALL in Fortran Applications Limitations of the Direct Call

Other Tips and Techniques to Improve Performance x

Coding Techniques Improving oneMKL Performance on Specific Processors Operating on Denormals

Using Memory Functions x

Avoiding Memory Leaks in oneMKL Using High-bandwidth Memory with oneMKL Redefining Memory Functions

Language-specific Usage Options x

Using Language-Specific Interfaces with Intel® oneAPI Math Kernel Library Mixed-language Programming with the Intel Math Kernel Library

Using Language-Specific Interfaces with Intel® oneAPI Math Kernel Library x

Interface Libraries and Modules Fortran 95 Interfaces to LAPACK and BLAS Compiler-dependent Functions and Fortran 90 Modules

Mixed-language Programming with the Intel Math Kernel Library x

Calling LAPACK, BLAS, and CBLAS Routines from C/C++ Language Environments Using Complex Types in C/C++ Calling BLAS Functions that Return the Complex Values in C/C++ Code

Obtaining Numerically Reproducible Results x

Getting Started with Conditional Numerical Reproducibility Specifying Code Branches Reproducibility Conditions Setting the Environment Variable for Conditional Numerical Reproducibility Code Examples

Coding Tips x

Example of Data Alignment Using Predefined Preprocessor Symbols for Intel® MKL Version-Dependent Compilation

Managing Output x

Using oneMKL Verbose Mode

Using oneMKL Verbose Mode x

Version Information Line Call Description Line

Working with the Intel® oneAPI Math Kernel Library Cluster Software x

Linking with oneMKL Cluster Software Setting the Number of OpenMP* Threads Using Shared Libraries Setting Environment Variables on a Cluster Interaction with the Message-passing Interface Using a Custom Message-Passing Interface Examples of Linking for Clusters

Examples of Linking for Clusters x

Examples for Linking a C Application Examples for Linking a Fortran Application

Managing Behavior of the Intel® oneAPI Math Kernel Library with Environment Variables x

Managing Behavior of Function Domains with Environment Variables Instruction Set Specific Dispatching on Intel® Architectures

Managing Behavior of Function Domains with Environment Variables x

Setting the Default Mode of Vector Math with an Environment Variable Managing Performance of the Cluster Fourier Transform Functions Managing Invalid Input Checking in LAPACKE Functions

Configuring Your Integrated Development Environment to Link with Intel® oneAPI Math Kernel Library x

Configuring the Eclipse* IDE CDT to Link with oneMKL

Intel® oneAPI Math Kernel Library Benchmarks x

Intel Optimized LINPACK Benchmark for Linux* Intel® Distribution for LINPACK* Benchmark Intel® Optimized High Performance Conjugate Gradient Benchmark

Intel Optimized LINPACK Benchmark for Linux* x

Contents of the Intel® Optimized LINPACK Benchmark Running the Software Known Limitations of the Intel® Optimized LINPACK Benchmark

Intel® Distribution for LINPACK* Benchmark x

Overview of the Intel® Distribution for LINPACK* Benchmark Contents of the Intel® Distribution for LINPACK* Benchmark Building the Intel® Distribution for LINPACK* Benchmark for a Customized MPI Implementation Building the Netlib HPL from Source Code Configuring Parameters Ease-of-use Command-line Parameters Running the Intel® Distribution for LINPACK* Benchmark Heterogeneous Support in the Intel® Distribution for LINPACK* Benchmark Environment Variables Improving Performance of Your Cluster

Intel® Optimized High Performance Conjugate Gradient Benchmark x

Overview of the Intel Optimized HPCG Versions of the Intel Optimized HPCG Getting Started with Intel Optimized HPCG Choosing Best Configuration and Problem Sizes

Appendix A: Intel® oneAPI Math Kernel Library Language Interfaces Support x

Language Interfaces Support, by Function Domain Include Files

Appendix B: Support for Third-Party Interfaces x

FFTW Interface Support

Appendix C: Directory Structure in Detail x

Detailed Structure of the IA-32 Architecture Directories Detailed Structure of the Intel® 64 Architecture Directories

Detailed Structure of the IA-32 Architecture Directories x

Static Libraries in the lib / ia32 _lin Directory Dynamic Libraries in the lib / ia32 _lin Directory

Detailed Structure of the Intel® 64 Architecture Directories x

Static Libraries in the lib / intel64 _lin Directory Dynamic Libraries in the lib / intel64 _lin Directory

Developer Guide for Intel® oneAPI Math Kernel Library for Linux*

Getting Help and Support

What's New

Notational Conventions

Related Information

Getting Started

Shared Library Versioning

CMake Config for oneMKL

Checking Your Installation

Setting Environment Variables

Scripts to Set Environment Variables

Modulefiles to Set Environment Variables

Automating the Process of Setting Environment Variables

Using the CMake Config File

Compiler Support

Using Code Examples

What You Need to Know Before You Begin Using the Intel® oneAPI Math Kernel Library

Structure of the Intel® oneAPI Math Kernel Library

Architecture Support

High-level Directory Structure

Layered Model Concept

Linking Your Application with the Intel® oneAPI Math Kernel Library

Linking Quick Start

Using the -qmkl Compiler Option

Using the Single Dynamic Library

Selecting Libraries to Link with

Using the Link-line Advisor

Using the Command-Line Link Tool

Linking Examples

Linking on IA-32 Architecture Systems

Linking on Intel(R) 64 Architecture Systems

Linking in Detail

Listing Libraries on a Link Line

Dynamically Selecting the Interface and Threading Layer

Linking with Interface Libraries

Using the ILP64 Interface vs. LP64 Interface

Linking with Fortran 95 Interface Libraries

Linking with Threading Libraries

Linking with Computational Libraries

Linking with Compiler Run-time Libraries

Linking with System Libraries

Building Custom Shared Objects

Using the Custom Shared Object Builder

Composing a List of Functions

Specifying Function Names

Distributing Your Custom Shared Object

Managing Performance and Memory

Improving Performance with Threading

OpenMP* Threaded Functions and Problems

Functions Threaded with Intel® Threading Building Blocks

Avoiding Conflicts in the Execution Environment

Techniques to Set the Number of Threads

Setting the Number of Threads Using an OpenMP* Environment Variable

Changing the Number of OpenMP* Threads at Run Time

Using Additional Threading Control

oneMKL-specific Environment Variables for OpenMP Threading Control

MKL_DYNAMIC

MKL_DOMAIN_NUM_THREADS

MKL_NUM_STRIPES

Setting the Environment Variables for Threading Control

Calling oneMKL Functions from Multi-threaded Applications

Using Intel® Hyper-Threading Technology

Managing Multi-core Performance

Managing Performance with Heterogeneous Cores

Improving Performance for Small Size Problems

Using MKL_DIRECT_CALL in C Applications

Using MKL_DIRECT_CALL in Fortran Applications

Limitations of the Direct Call

Other Tips and Techniques to Improve Performance

Coding Techniques

Improving oneMKL Performance on Specific Processors

Operating on Denormals

Using Memory Functions

Avoiding Memory Leaks in oneMKL

Using High-bandwidth Memory with oneMKL

Redefining Memory Functions

Language-specific Usage Options

Using Language-Specific Interfaces with Intel® oneAPI Math Kernel Library

Interface Libraries and Modules

Fortran 95 Interfaces to LAPACK and BLAS

Compiler-dependent Functions and Fortran 90 Modules

Mixed-language Programming with the Intel Math Kernel Library

Calling LAPACK, BLAS, and CBLAS Routines from C/C++ Language Environments

Using Complex Types in C/C++

Calling BLAS Functions that Return the Complex Values in C/C++ Code

Obtaining Numerically Reproducible Results

Getting Started with Conditional Numerical Reproducibility

Specifying Code Branches

Reproducibility Conditions

Setting the Environment Variable for Conditional Numerical Reproducibility

Code Examples

Coding Tips

Example of Data Alignment

Using Predefined Preprocessor Symbols for Intel® MKL Version-Dependent Compilation

Managing Output

Using oneMKL Verbose Mode

Version Information Line

Call Description Line

Working with the Intel® oneAPI Math Kernel Library Cluster Software

Linking with oneMKL Cluster Software

Setting the Number of OpenMP* Threads

Using Shared Libraries

Setting Environment Variables on a Cluster

Interaction with the Message-passing Interface

Using a Custom Message-Passing Interface

Examples of Linking for Clusters

Examples for Linking a C Application

Examples for Linking a Fortran Application

Managing Behavior of the Intel® oneAPI Math Kernel Library with Environment Variables

Managing Behavior of Function Domains with Environment Variables

Setting the Default Mode of Vector Math with an Environment Variable

Managing Performance of the Cluster Fourier Transform Functions

Managing Invalid Input Checking in LAPACKE Functions

Instruction Set Specific Dispatching on Intel® Architectures

Configuring Your Integrated Development Environment to Link with Intel® oneAPI Math Kernel Library

Configuring the Eclipse* IDE CDT to Link with oneMKL

Intel® oneAPI Math Kernel Library Benchmarks

Intel Optimized LINPACK Benchmark for Linux*

Contents of the Intel® Optimized LINPACK Benchmark

Running the Software

Known Limitations of the Intel® Optimized LINPACK Benchmark

Intel® Distribution for LINPACK* Benchmark

Overview of the Intel® Distribution for LINPACK* Benchmark

Contents of the Intel® Distribution for LINPACK* Benchmark

Building the Intel® Distribution for LINPACK* Benchmark for a Customized MPI Implementation

Building the Netlib HPL from Source Code

Configuring Parameters

Ease-of-use Command-line Parameters

Running the Intel® Distribution for LINPACK* Benchmark

Heterogeneous Support in the Intel® Distribution for LINPACK* Benchmark

Environment Variables

Improving Performance of Your Cluster

Intel® Optimized High Performance Conjugate Gradient Benchmark

Overview of the Intel Optimized HPCG

Versions of the Intel Optimized HPCG

Getting Started with Intel Optimized HPCG

Choosing Best Configuration and Problem Sizes

Appendix A: Intel® oneAPI Math Kernel Library Language Interfaces Support

Language Interfaces Support, by Function Domain

Include Files

Appendix B: Support for Third-Party Interfaces

FFTW Interface Support

Appendix C: Directory Structure in Detail

Detailed Structure of the IA-32 Architecture Directories

Static Libraries in the lib / ia32 _lin Directory

Dynamic Libraries in the lib / ia32 _lin Directory

Detailed Structure of the Intel® 64 Architecture Directories

Static Libraries in the lib / intel64 _lin Directory

Dynamic Libraries in the lib / intel64 _lin Directory

Notices and Disclaimers

Improving Performance for Small Size Problems

The overhead of calling an Intel® oneAPI Math Kernel Library function for small problem sizes can be significant when the functionhas a large number of parameters or internally checks parameter errors. To reduce the performance overhead for these small size problems, the Intel® oneAPI Math Kernel Librarydirect callfeature works in conjunction with the compiler to preprocess the calling parameters to supported Intel® oneAPI Math Kernel Library functions and directly call or inline special optimized small-matrix kernels that bypass error checking.For a list of functions supporting direct call, see Limitations of the Direct Call.

To activate the feature, do the following:

Compile your C or Fortran code with the preprocessor macro depending on whether a threaded or sequential mode of Intel® oneAPI Math Kernel Library is required by supplying the compiler option as explained below:

Intel® oneAPI Math Kernel Library Mode	Macro	Compiler Option
Threaded	MKL_DIRECT_CALL	-DMKL_DIRECT_CALL
Sequential	MKL_DIRECT_CALL_SEQ	-DMKL_DIRECT_CALL_SEQ

For Fortran applications:
- Enable preprocessor by using the -fpp option for Intel® Fortran Compiler.
- Include the Intel® oneAPI Math Kernel Library Fortran include filemkl_direct_call.fi.

Intel® oneAPI Math Kernel Library skips error checking and intermediate function calls if the problem size is small enough (for example: a call to a function that supports direct call, such asdgemm, with matrix ranks smaller than 50).

Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Notice revision #20201201

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201

Parent topic: Managing Performance and Memory

Using MKL_DIRECT_CALL in C Applications
Using MKL_DIRECT_CALL in Fortran Applications
Limitations of the Direct Call

Level Two Title

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Developer Guide for Intel® oneAPI Math Kernel Library Linux*

Improving Performance for Small Size Problems