By Rezaur RahmanUse these simple and yet powerful APIs provided by the Intel® MKL to port FFTW 2.x and FFTW 3.x code easily to Intel MKL interfaces. Find out how.Fast Fourier Transforms (FFTs) are used heavily in the technical and image processing field to perform various types of computational tasks. Fastest Fourier Transform in the West (FFTW) is a collection of discrete Fourier transform routines available publicly. Intel has its own version of FFT code implemented as part of the Intel® Math Kernel Library (Intel® MKL). These are known as advanced Discreet Fourier Transform (DFT) routines and provide an interface referred to as DFTI in this document. These routines are optimized for Intel® architecture-based platforms (Intel® Itanium® processor-based platforms, platforms with Intel® Extended Memory 64 Technology (Intel® EM64T), and IA-32-based platforms). Software developers may gain considerable performance by using the Intel MKL for computing DFTs due to the optimizations built into these libraries. An additional benefit to using the Intel MKL is that independent software vendors (ISVs) may be able to take advantage of newer Intel architecture technologies without much effort. This article introduces the readers to the APIs provided by the Intel MKL to invoke DFT code. Although the Intel MKL provides two sets of interfaces for DFT, we discourage you from using the older FFT interface. This document focuses on the latest DFT interfaces (referred to as DFTI) which are made with more flexibility and support more advanced features. In addition, we will look at how the DFTI interfaces relate to publicly available FFTW 2.x and FFTW3.x. We will not detail the interface definitions which are defined in the Intel MKL Reference Manual provided with the library. DFTI provides both the FORTRAN and C interfaces; we will deal with FORTRAN interfaces in this document. For C interfaces, there is a one-to-one correspondence and the same concepts apply. Fortran APIs need support of FORTRAN 95 as the API makes use of various modern constructs of FORTRAN 95. It is assumed that you are familiar with the theory of DFT and related terms and properties. Brushing up on these terms will make this article more relevant. There is an abundance of literature on the subject. In the DFT Equations section of this paper we shall briefly look at the mathematical formulation of a DFT. This will allow us to look at the parameters involved in defining a DFT operation. The DFTI Interfaces in the Intel MKL section will walk through the set of interfaces provided by DFTI that allows manipulation of the parameters involved in a DFT operation and the actual computations. Finally, the Mapping between FFTW and Intel MKL Interfaces section will relate the DFTI with the FFTW interfaces for those interested in porting their code from FFTW to DFTI. Note: The relationship between the terms DFT and FFT should be clarified. These terms are often used interchangeably in this article. The term DFT indicates a mathematical operation known as Fourier Transform that is applied to a finite sequences of discrete values. FFT, on the other hand, represents an algorithm to perform the DFT in an optimized manner. The DFT routines in the Intel MKL provide one-dimensional, two-dimensional, and multi-dimensional (up to seven dimensions) FFT routines with support for mixed radix. The routines are optimized to make effective use of caches and processor architectures. For one-dimensional and two-dimensional FFTs, the library provides complex-to-complex, real-to-complex, and complex-to-real transforms. As of version 7.2, the DFTI only supports complex-to-complex transforms for >2 dimensional FFTs. All of these transforms may be ‘in-place’ or ‘not in place’ FFTs. The forward one-dimensional FFT is computed according to the following mathematical equation: 
(Eq. 1) The backward FFT is computed according to the following equation: 
(Eq. 2) where:  i being the square root of -1 These equations can easily be expanded to two-dimensional and multi-dimensional DFTs. Looking at these two equations gives us the basis on which various DFT interfaces are defined. | DFTI Interfaces in the Intel MKL |
Let us look at various aspects of performing a DFT as shown by the mathematical formulation on page two. First, there are two transforms, one called forward transform, defined by Eq (1) in DFTI, and the backward transform, defined by Eq (2). The forward transform is determined by the negative sign of j in the  factor and the backward transform defined by the positive sign in the same factor in Equation 2 over a sample of size n. However, DFTI provides a flexible interface to define the sign for forward and backward transforms through a set of configuration interfaces known as descriptor manipulation routines. Another important computational aspect is the normalization factor of the transforms. As seen from Equation 2, the backward transform has a multiplier  that is needed to normalize the magnitudes of the FFT outputs as it goes through a forward followed by a backward FFT. DFTI provides a way to set the scaling (normalization) factor through the descriptor manipulation routine for both the forward and backward FFT. In order to perform the transforms, one needs to define the forward domain of the input set over which the transform is performed. This forward domain may be of type real, complex or conjugate event. This is defined during the creation of the descriptor in DFTI. One should also be able to set the precision of the input numbers to be double or single and is set as part of the descriptor. One final aspect to keep in mind when programming with DFTI is the storage format that is used for providing input and output data to the DFTI. The DFTI APIs provide an elegant and unified set of APIs to perform FFTs with all the parameters described above and more. Note that in this article we shall not discuss the detailed syntax of the interfaces, but you may consult the Intel MKL API reference document for details on how to use these routines in your code. Using the InterfacesThe DFT APIs are designed so that they provide a uniform interface for all forms of FFTs, ranging from one-dimension up to multi-dimensional over mixed radix and various complex-to-complex and real-to-complex forward and backward transformations. DFTI defines three categories of APIs: - Descriptor Management Routines: The various attributes that affect the computation of FFTs (like sign and scaling factors) are defined through the descriptor management routines. The object used to manage these attributes is named descriptors.
- Computing Routines: These have four combinations—two due to forward and backward computation with overloaded parameter types (complex and real included). The other two are for ‘in place’ and ‘not in place’ computations with regard to forward and backward transforms.
- Status Check Routines: As with other standard APIs, status checking routines let the programmer know whether a particular API call was successful or not.
To utilize the DFT routines we can use the Intel MKL_DFTI module in FORTRAN by referencing the modules defined in file mkl_dfti.f90. For C/C++, we can use the include file mkl_dfti.h. Both of these files are found in the Include directory of the installed version of the Intel MKL library. Various configuration parameters related to a DFT, such as length and input domain, are contained in a descriptor object in DFTI API. It is possible to create multiple descriptors and relate them to multiple DFT computations. A descriptor is created by the following API call DftiCreateDescriptor. Once a descriptor is created, it contains default configuration settings such as in-place transform and sign as -1. Please refer to the Intel MKL Reference Manual for a list of configuration parameters and their default values. These values can be queried and modified using the two other routines called DftiSetValue and DftiGetValue. Once created, the descriptor has to be committed through a call to DftiCommitDescriptor. Trying to use an uncommitted descriptor in a call to an FFT routine will result in an error. Whenever we change any parameter tied to a descriptor, the descriptor needs to be re-committed. The committal of the descriptor performs all initialization needed to perform optimal DFT computation which may involve exploring various factorizations of the input length for efficient computation. A copy of the descriptor can be made for a computation that needs changes to some parameters. This is done through a call to DftiCopyDescriptor. A committed descriptor is used for an FFT computation. There are two main interfaces for doing FFT computations: DftiComputeForward and DftiComputeBackward. However, these routines are overloaded for in-place and out-of-place FFT computation and various types of input/output parameters for FORTRAN interfaces. These routines also take a variable list of void pointer types for the C interfaces. Another configuration parameter that affects the FFT computation is the storage type. Setting the storage type to proper values expected out of the computation may help improving the performance of the FFT code. In order to compute the FFT properly, make sure you have the correct size and data type storage passed into these routines. Once done with a descriptor, the memory taken up by the descriptor needs to be freed by calling DftiFreeDescriptor. The routines described above return status as an integer value. The successful execution of a routine is denoted by DFTI_NO_ERROR. Otherwise, an error condition occurs, which may be checked against a predefined error class through the DftiErrorClass routine. This feature is included for future expandability to provide various error classifications. If the error is of a known class, the string describing the error is available through a DftiErrorMessage routine call. This routine returns a string of max length DFTI_MAX_MESSAGE_LENGTH and can be used to output human readable messages. Although the messages are very primitive at this time, improvements are expected in the future. | Mapping between FFTW and Intel MKL Interfaces |
If provided with an FFTW-based code, it is possible to write an equivalent DFTI-based code using the interface mapping provided in this section. Look up the FFTW routine you plan to change on the left side of the tables below and use the corresponding DFTI routine on the right side of the table to create semantically similar code. Refer to the Intel MKL Reference Library for semantic and syntactic definitions of these routines. The DFT computation APIs may be broken down into five categories. The categories are as follows: - Initialization: The routines in this category are used for setting up the parameters needed to perform a DFT. These routines may also analyze the parameters and platform capabilities for optimal computation.
- Computation: The routines in this category are used to compute the actual transforms. This may be a generic interface, such as provided by the Intel MKL or FFTW 3.x that are overloaded to perform various tasks, or may be specific calls depending on the type or intention of the transforms (for example, in-place, not-in-place, complex-to-real) as in FFTW 2.x interfaces.
- Cleanup: These routines free up the data structures allocated by the library at the end of computation.
- Status: These are used to provide detailed information of the status returned by the routines in the previous three categories.
- Data Storage Formats: The libraries may provide their own data types and storage formats. The porting process involves an understanding of the data storage types supported by various libraries.
The following table provides duality between the interfaces provided by FFTW and DFTI. Look up the FFTW interfaces in the left column and find the corresponding DFTI interfaces in the right column for performing semantically similar operations. Relationship between FFTW 2.1.5 API and Intel MKL provided DFTI| Method Types | FFTW | Intel® MKL DFTI | | Initialization APIs | fftw_f77_create_plan() fftw_f77_create_plan_specific() fftwnd_f77_create_plan() fftw2d_f77_create_plan() fftw3d_f77_create_plan() fftwnd_f77_create_plan_specific() fftw2d_f77_create_plan_specific() fftw3d_f77_create_plan_specific() rfftw_f77_create_plan() rfftw_f77_create_plan_specific() rfftwnd_f77_create_plan() rfftw2d_f77_create_plan() rfftw3d_f77_create_plan()
| The APIS on the left may be replaced by a combination of the following APIS in DFTI DftiCreateDescriptor() DftiSetValue() DftiGetValue() DftiCommitDescriptor()
| | Computation APIs | fftw_f77() fftw_f77_one() fftwnd_f77() fftwnd_f77_one() rfftw_f77() rfftw_f77_one() rfftw_f77_one_real_to_complex() rfftw_f77_one_complex_to_real() rfftwnd_f77() rfftwnd_f77_one() rfftwnd_f77_one_real_to_complex() rfftwnd_f77_one_complex_to_real()
| DftiComputeForward() DftiComputeBackward()
| | Cleanup APIs | | | | Status APIs | | DftiErrorClass DftiErrorMessage
| | Storage Types | | No equivalent HalfComplex array. Uses various Real Data array storage format, like CCS, Pack or Perm Precision set in the descriptor
|
Note: The table above does not list FORTRAN interfaces and Equivalent ‘C’ interfaces. FFTW Interfaces Unsupported in DFTI Following are the interfaces that are unsupported in DFTI:
void fftw_export_wisdom(void (*emitter)(char c, void *), void *data);
void fftw_export_wisdom_to_file(FILE *output_file);
char *fftw_export_wisdom_to_string(void);
fftw_status fftw_import_wisdom(int (*get_input)(void *), void *data);
fftw_status fftw_import_wisdom_from_file(FILE *input_file);
fftw_status fftw_import_wisdom_from_string(const char *input_string);
void fftw_forget_wisdom(void);
Memory allocations
void *(*fftw_malloc_hook) (size_t n);
void (*fftw_free_hook) (void *p);
|
Relationship between FFTW 3.0 API and Intel MKL provided DFTI (FORTRAN interfaces)
| Method Types | FFTW | Intel MKL DFTI | | Initialization APIs | | The APIS on the left may be replaced by a combination of the following APIS in DFTI DftiCreateDescriptor() DftiSetValue() DftiGetValue() DftiCommitDescriptor()
| | Computation APIs | | DftiComputeForward() DftiComputeBackward()
| | Cleanup APIs | | | | Status APIs | | DftiErrorClass DftiErrorMessage
| | Memory Allocation | | | | Storage Types | | No equivalent Halfcomplex array. Use various Real Data array storage format, like CCS, Pack or Perm Precision (single, double) set in the descriptor
|
Example Code The following example shows how to port a 1-D complex-to-complex FFT in-place transform to DFTI interface. The lines in bold show the FFTW code, the italicized lines show the corresponding DFTI code, and the common code is shown in black.
! Fortran example.
! 1D complex to complex transform
#include “fftw_f77.i” !Use MKL_DFTI
Complex :: X(32)
Complex :: Scratch(32)
Integer :: plan !Integer :: Status
!type(DFTI_DESCRIPTOR), POINTER :: My_Desc1_Handle
...put input data into X(1),...,X(32)
! Perform a complex to complex transform call fftw_f77_create_plan(plan,32,FFTW_FORWARD,FFTW_IN_PLACE+FFTW_ESTIMATE) !Status = DftiCreateDescriptor( My_Desc1_Handle, DFTI_SINGLE, &
! DFTI_COMPLEX, 1, 32 )
!Status = DftiCommitDescriptor( My_Desc1_Handle ) call fftw_f77_one(plan,X,Scratch)
!Status = DftiComputeForward( My_Desc1_Handle, X )
! result is given by {X(1),X(2),...,X(32)}
call fftw_f77_destroy_plan(plan) !Status = DftiFreeDescriptor(My_Desc1_Handle) |
Intel MKL interfaces provide flexible interfaces for performing FFT. The simple and yet powerful APIs provided by the Intel MKL enable you to port FFTW 2.x and FFTW 3.x code easily to MKL interfaces. For those who do not like to make any changes to their source code, the next release of the Intel MKL, version 8.0, is expected to support FFTW wrappers that will allow porting the FFTW 3.x code to MKL interfaces by linking in with a wrapper library. The wrapper library will be available in the form of source code that the user can build and modify.
Rezaur Rahman is a Senior Software Engineer at the Parallel and Distributed Services Division of Intel Software Solution Group. He has worked with Intel R&D team for over a decade and has contributed to various Web and Industry standard development teams such as the W3C, IETF and so on. His current interest is in High Performance Computing and Computer Architecture.
|