Improving Performance for Small Size Problems

Developer Guide

Developer Guide for Intel® oneAPI Math Kernel Library Windows*

Download PDF

ID 766692

Date 12/16/2022

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Document Table of Contents x

Developer Guide for Intel® oneAPI Math Kernel Library for Windows*

Developer Guide for Intel® oneAPI Math Kernel Library for Windows* x

Getting Help and Support What's New Notational Conventions Related Information Getting Started Structure of the Intel® oneAPI Math Kernel Library Linking Your Application with the Intel® oneAPI Math Kernel Library Managing Performance and Memory Language-specific Usage Options Obtaining Numerically Reproducible Results Coding Tips Managing Output Working with the Intel® oneAPI Math Kernel Library Cluster Software Managing Behavior of the Intel® oneAPI Math Kernel Library with Environment Variables Programming with Intel® Math Kernel Library in Integrated Development Environments (IDE) Intel® oneAPI Math Kernel Library Benchmarks Appendix A: Intel® oneAPI Math Kernel Library Language Interfaces Support Appendix B: Support for Third-Party Interfaces Appendix C: Directory Structure in Detail Notices and Disclaimers

Getting Started x

Shared Library Versioning CMake Config for oneMKL Checking Your Installation Setting Environment Variables Compiler Support Using Code Examples What You Need to Know Before You Begin Using the Intel® oneAPI Math Kernel Library

Structure of the Intel® oneAPI Math Kernel Library x

Architecture Support High-level Directory Structure Layered Model Concept

Linking Your Application with the Intel® oneAPI Math Kernel Library x

Linking Quick Start Linking Examples Linking in Detail Building Custom Dynamic-link Libraries Building a Universal Windows Driver

Linking Quick Start x

Using the /Qmkl Compiler Option Automatically Linking a Project in the Visual Studio* Integrated Development Environment with Intel® oneAPI Math Kernel Library Using the Single Dynamic Library Selecting Libraries to Link with Using the Link-line Advisor Using the Command-Line Link Tool

Automatically Linking a Project in the Visual Studio* Integrated Development Environment with Intel® oneAPI Math Kernel Library x

Automatically Linking Your Microsoft Visual C/C++* Project with oneMKL Automatically Linking Your Intel® Visual Fortran Project with oneMKL

Linking Examples x

Linking on IA-32 Architecture Systems Linking on Intel(R) 64 Architecture Systems

Linking in Detail x

Dynamically Selecting the Interface and Threading Layer Linking with Interface Libraries Linking with Threading Libraries Linking with Computational Libraries Linking with Compiler Run-time Libraries Linking with System Libraries

Linking with Interface Libraries x

Using the ILP64 Interface vs. LP64 Interface Linking with Fortran 95 Interface Libraries

Building Custom Dynamic-link Libraries x

Using the Custom Dynamic-link Library Builder in the Command-line Mode Composing a List of Functions Specifying Function Names Building a Custom Dynamic-link Library in the Visual Studio* Development System Distributing Your Custom Dynamic-link Library

Managing Performance and Memory x

Improving Performance with Threading Improving Performance for Small Size Problems Other Tips and Techniques to Improve Performance Using Memory Functions

Improving Performance with Threading x

OpenMP* Threaded Functions and Problems Functions Threaded with Intel® Threading Building Blocks Avoiding Conflicts in the Execution Environment Techniques to Set the Number of Threads Setting the Number of Threads Using an OpenMP* Environment Variable Changing the Number of OpenMP* Threads at Run Time Using Additional Threading Control Calling oneMKL Functions from Multi-threaded Applications Using Intel® Hyper-Threading Technology Managing Multi-core Performance Managing Performance with Heterogeneous Cores

Using Additional Threading Control x

oneMKL-specific Environment Variables for OpenMP Threading Control MKL_DYNAMIC MKL_DOMAIN_NUM_THREADS MKL_NUM_STRIPES Setting the Environment Variables for Threading Control

Improving Performance for Small Size Problems x

Using MKL_DIRECT_CALL in C Applications Using MKL_DIRECT_CALL in Fortran Applications Limitations of the Direct Call

Other Tips and Techniques to Improve Performance x

Coding Techniques Improving oneMKL Performance on Specific Processors Operating on Denormals

Using Memory Functions x

Avoiding Memory Leaks in oneMKL Redefining Memory Functions

Language-specific Usage Options x

Using Language-Specific Interfaces with Intel® oneAPI Math Kernel Library Mixed-language Programming with the Intel Math Kernel Library

Using Language-Specific Interfaces with Intel® oneAPI Math Kernel Library x

Interface Libraries and Modules Fortran 95 Interfaces to LAPACK and BLAS Compiler-dependent Functions and Fortran 90 Modules

Mixed-language Programming with the Intel Math Kernel Library x

Calling LAPACK, BLAS, and CBLAS Routines from C/C++ Language Environments Using Complex Types in C/C++ Calling BLAS Functions that Return the Complex Values in C/C++ Code

Obtaining Numerically Reproducible Results x

Getting Started with Conditional Numerical Reproducibility Specifying Code Branches Reproducibility Conditions Setting the Environment Variable for Conditional Numerical Reproducibility Code Examples

Coding Tips x

Example of Data Alignment Using Predefined Preprocessor Symbols for Intel® MKL Version-Dependent Compilation

Managing Output x

Using oneMKL Verbose Mode

Using oneMKL Verbose Mode x

Version Information Line Call Description Line

Working with the Intel® oneAPI Math Kernel Library Cluster Software x

Message-Passing Interface Support Linking with oneMKL Cluster Software Determining the Number of OpenMP* Threads Using DLLs Setting Environment Variables on a Cluster Interaction with the Message-passing Interface Using a Custom Message-Passing Interface Examples of Linking for Clusters

Examples of Linking for Clusters x

Examples for Linking a C Application Examples for Linking a Fortran Application

Managing Behavior of the Intel® oneAPI Math Kernel Library with Environment Variables x

Managing Behavior of Function Domains with Environment Variables Instruction Set Specific Dispatching on Intel® Architectures

Managing Behavior of Function Domains with Environment Variables x

Setting the Default Mode of Vector Math with an Environment Variable Managing Performance of the Cluster Fourier Transform Functions Managing Invalid Input Checking in LAPACKE Functions

Programming with Intel® Math Kernel Library in Integrated Development Environments (IDE) x

Configuring Your Integrated Development Environment to Link with Intel® oneAPI Math Kernel Library Getting Assistance for Programming in the Microsoft Visual Studio* IDE

Configuring Your Integrated Development Environment to Link with Intel® oneAPI Math Kernel Library x

Configuring the Microsoft Visual C/C++* Development System to Link with Intel® MKL Configuring Intel® Visual Fortran to Link with Intel MKL

Getting Assistance for Programming in the Microsoft Visual Studio* IDE x

Using Context-Sensitive Help Using the IntelliSense* Capability

Intel® oneAPI Math Kernel Library Benchmarks x

Intel Optimized LINPACK Benchmark for Windows* Intel® Distribution for LINPACK* Benchmark

Intel Optimized LINPACK Benchmark for Windows* x

Contents of the Intel® Optimized LINPACK Benchmark Running the Software Known Limitations of the Intel® Optimized LINPACK Benchmark

Intel® Distribution for LINPACK* Benchmark x

Overview of the Intel® Distribution for LINPACK* Benchmark Contents of the Intel® Distribution for LINPACK* Benchmark Building the Intel® Distribution for LINPACK* Benchmark for a Customized MPI Implementation Building the Netlib HPL from Source Code Configuring Parameters Ease-of-use Command-line Parameters Running the Intel® Distribution for LINPACK* Benchmark Heterogeneous Support in the Intel® Distribution for LINPACK* Benchmark Environment Variables Improving Performance of Your Cluster

Appendix A: Intel® oneAPI Math Kernel Library Language Interfaces Support x

Language Interfaces Support, by Function Domain Include Files

Appendix B: Support for Third-Party Interfaces x

FFTW Interface Support

Appendix C: Directory Structure in Detail x

Detailed Structure of the IA-32 Architecture Directories Detailed Structure of the Intel® 64 Architecture Directories

Detailed Structure of the IA-32 Architecture Directories x

Static Libraries in the lib \\ ia32 Directory Dynamic Libraries in the lib \\ ia32 Directory Contents of the redist\\ia32 Directory

Detailed Structure of the Intel® 64 Architecture Directories x

Static Libraries in the lib \\ intel64 Directory Dynamic Libraries in the lib \\ intel64 Directory Contents of the redist\\intel64 Directory

Developer Guide for Intel® oneAPI Math Kernel Library for Windows*

Getting Help and Support

What's New

Notational Conventions

Related Information

Getting Started

Shared Library Versioning

CMake Config for oneMKL

Checking Your Installation

Setting Environment Variables

Compiler Support

Using Code Examples

What You Need to Know Before You Begin Using the Intel® oneAPI Math Kernel Library

Structure of the Intel® oneAPI Math Kernel Library

Architecture Support

High-level Directory Structure

Layered Model Concept

Linking Your Application with the Intel® oneAPI Math Kernel Library

Linking Quick Start

Using the /Qmkl Compiler Option

Automatically Linking a Project in the Visual Studio* Integrated Development Environment with Intel® oneAPI Math Kernel Library

Automatically Linking Your Microsoft Visual C/C++* Project with oneMKL

Automatically Linking Your Intel® Visual Fortran Project with oneMKL

Using the Single Dynamic Library

Selecting Libraries to Link with

Using the Link-line Advisor

Using the Command-Line Link Tool

Linking Examples

Linking on IA-32 Architecture Systems

Linking on Intel(R) 64 Architecture Systems

Linking in Detail

Dynamically Selecting the Interface and Threading Layer

Linking with Interface Libraries

Using the ILP64 Interface vs. LP64 Interface

Linking with Fortran 95 Interface Libraries

Linking with Threading Libraries

Linking with Computational Libraries

Linking with Compiler Run-time Libraries

Linking with System Libraries

Building Custom Dynamic-link Libraries

Using the Custom Dynamic-link Library Builder in the Command-line Mode

Composing a List of Functions

Specifying Function Names

Building a Custom Dynamic-link Library in the Visual Studio* Development System

Distributing Your Custom Dynamic-link Library

Building a Universal Windows Driver

Managing Performance and Memory

Improving Performance with Threading

OpenMP* Threaded Functions and Problems

Functions Threaded with Intel® Threading Building Blocks

Avoiding Conflicts in the Execution Environment

Techniques to Set the Number of Threads

Setting the Number of Threads Using an OpenMP* Environment Variable

Changing the Number of OpenMP* Threads at Run Time

Using Additional Threading Control

oneMKL-specific Environment Variables for OpenMP Threading Control

MKL_DYNAMIC

MKL_DOMAIN_NUM_THREADS

MKL_NUM_STRIPES

Setting the Environment Variables for Threading Control

Calling oneMKL Functions from Multi-threaded Applications

Using Intel® Hyper-Threading Technology

Managing Multi-core Performance

Managing Performance with Heterogeneous Cores

Improving Performance for Small Size Problems

Using MKL_DIRECT_CALL in C Applications

Using MKL_DIRECT_CALL in Fortran Applications

Limitations of the Direct Call

Other Tips and Techniques to Improve Performance

Coding Techniques

Improving oneMKL Performance on Specific Processors

Operating on Denormals

Using Memory Functions

Avoiding Memory Leaks in oneMKL

Redefining Memory Functions

Language-specific Usage Options

Using Language-Specific Interfaces with Intel® oneAPI Math Kernel Library

Interface Libraries and Modules

Fortran 95 Interfaces to LAPACK and BLAS

Compiler-dependent Functions and Fortran 90 Modules

Mixed-language Programming with the Intel Math Kernel Library

Calling LAPACK, BLAS, and CBLAS Routines from C/C++ Language Environments

Using Complex Types in C/C++

Calling BLAS Functions that Return the Complex Values in C/C++ Code

Obtaining Numerically Reproducible Results

Getting Started with Conditional Numerical Reproducibility

Specifying Code Branches

Reproducibility Conditions

Setting the Environment Variable for Conditional Numerical Reproducibility

Code Examples

Coding Tips

Example of Data Alignment

Using Predefined Preprocessor Symbols for Intel® MKL Version-Dependent Compilation

Managing Output

Using oneMKL Verbose Mode

Version Information Line

Call Description Line

Working with the Intel® oneAPI Math Kernel Library Cluster Software

Message-Passing Interface Support

Linking with oneMKL Cluster Software

Determining the Number of OpenMP* Threads

Using DLLs

Setting Environment Variables on a Cluster

Interaction with the Message-passing Interface

Using a Custom Message-Passing Interface

Examples of Linking for Clusters

Examples for Linking a C Application

Examples for Linking a Fortran Application

Managing Behavior of the Intel® oneAPI Math Kernel Library with Environment Variables

Managing Behavior of Function Domains with Environment Variables

Setting the Default Mode of Vector Math with an Environment Variable

Managing Performance of the Cluster Fourier Transform Functions

Managing Invalid Input Checking in LAPACKE Functions

Instruction Set Specific Dispatching on Intel® Architectures

Programming with Intel® Math Kernel Library in Integrated Development Environments (IDE)

Configuring Your Integrated Development Environment to Link with Intel® oneAPI Math Kernel Library

Configuring the Microsoft Visual C/C++* Development System to Link with Intel® MKL

Configuring Intel® Visual Fortran to Link with Intel MKL

Getting Assistance for Programming in the Microsoft Visual Studio* IDE

Using Context-Sensitive Help

Using the IntelliSense* Capability

Intel® oneAPI Math Kernel Library Benchmarks

Intel Optimized LINPACK Benchmark for Windows*

Contents of the Intel® Optimized LINPACK Benchmark

Running the Software

Known Limitations of the Intel® Optimized LINPACK Benchmark

Intel® Distribution for LINPACK* Benchmark

Overview of the Intel® Distribution for LINPACK* Benchmark

Contents of the Intel® Distribution for LINPACK* Benchmark

Building the Intel® Distribution for LINPACK* Benchmark for a Customized MPI Implementation

Building the Netlib HPL from Source Code

Configuring Parameters

Ease-of-use Command-line Parameters

Running the Intel® Distribution for LINPACK* Benchmark

Heterogeneous Support in the Intel® Distribution for LINPACK* Benchmark

Environment Variables

Improving Performance of Your Cluster

Appendix A: Intel® oneAPI Math Kernel Library Language Interfaces Support

Language Interfaces Support, by Function Domain

Include Files

Appendix B: Support for Third-Party Interfaces

FFTW Interface Support

Appendix C: Directory Structure in Detail

Detailed Structure of the IA-32 Architecture Directories

Static Libraries in the lib \\ ia32 Directory

Dynamic Libraries in the lib \\ ia32 Directory

Contents of the redist\\ia32 Directory

Detailed Structure of the Intel® 64 Architecture Directories

Static Libraries in the lib \\ intel64 Directory

Dynamic Libraries in the lib \\ intel64 Directory

Contents of the redist\\intel64 Directory

Notices and Disclaimers

Improving Performance for Small Size Problems

The overhead of calling an Intel® oneAPI Math Kernel Library function for small problem sizes can be significant when the functionhas a large number of parameters or internally checks parameter errors. To reduce the performance overhead for these small size problems, the Intel® oneAPI Math Kernel Librarydirect callfeature works in conjunction with the compiler to preprocess the calling parameters to supported Intel® oneAPI Math Kernel Library functions and directly call or inline special optimized small-matrix kernels that bypass error checking.For a list of functions supporting direct call, see Limitations of the Direct Call.

To activate the feature, do the following:

Compile your C or Fortran code with the preprocessor macro depending on whether a threaded or sequential mode of Intel® oneAPI Math Kernel Library is required by supplying the compiler option as explained below:

Intel® oneAPI Math Kernel Library Mode	Macro	Compiler Option
Threaded	MKL_DIRECT_CALL	/DMKL_DIRECT_CALL
Sequential	MKL_DIRECT_CALL_SEQ	/DMKL_DIRECT_CALL_SEQ

For Fortran applications:
- Enable preprocessor by using the /fpp option for Intel® Fortran Compiler.
- Include the Intel® oneAPI Math Kernel Library Fortran include filemkl_direct_call.fi.

Intel® oneAPI Math Kernel Library skips error checking and intermediate function calls if the problem size is small enough (for example: a call to a function that supports direct call, such asdgemm, with matrix ranks smaller than 50).

Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Notice revision #20201201

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201

Parent topic: Managing Performance and Memory

Using MKL_DIRECT_CALL in C Applications
Using MKL_DIRECT_CALL in Fortran Applications
Limitations of the Direct Call

Level Two Title

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Developer Guide for Intel® oneAPI Math Kernel Library Windows*

Improving Performance for Small Size Problems