# How to use Intel® IPP’s 1D Fourier Transform Functions

Published: 11/07/2012

Last Updated: 10/26/2015

Intel® IPP provides several functions to compute the forward and reverse fast Fourier transform algorithm for real or complex data.  While there are 1D and 2D versions in the library, this article will focus on 1D.

There are two sets of functions: DFT and FFT.  For a general single interface, use DFT.  The DFT functions implement the FFT algorithm for arbitrary array sizes, including powers of 2.  The Intel® IPP FFT functions are limited to powers of 2.  Internally, the DFT functions check for a power of 2 input size and call the FFT functions for that special case.  If you only use power of 2 sizes you can skip these checks and call the FFT functions directly.

In previous versions of Intel® IPP buffers for internal state, twiddle factors, etc. were allocated internally. As a general Intel® IPP strategy, allowing your application to handle these buffers directly gives you more options and control.  Internal allocations are now deprecated.

Note,  starting from IPP version 9.0, the DFTInitAlloc_* or FFTAllocInit_* are removed, please replace the *InitAlloc* APIs by the *GetSize_*, and *Init_* APIs, as showed in the bellow examples.

Here is a basic description of the  external buffer  initialization sequence for Intel® IPP FFT.  This pattern is a familiar one, used for many Intel® IPP functions.

1) Query the implementation to get the sizes of required buffers and allocate them.  For FFT this is

• A "spec" to hold configuration details, coefficients/twiddle factors, etc.
• A working buffer for initialization (used in init only -- can be freed immediately after)
• A working buffer for  the FFT operation

2) Do the  init  step. This is separate from the main function so that anything that can be precalculated for a specific signal size, normalization, and accuracy hint can be done outside of a loop.  This precalculated information is referenced later through the spec structure.

3) do the FFT.  The necessary inputs are

• Input and output buffer.  In most cases these can be the same.  That is, the destination buffer can be the source buffer for in-place operation.
• The spec (holds anything where there is a speed advantage to precalculate).
• The working buffer (holds any temporary values).

Example code implementing all of these steps with external allocation is below.  There are many similarities between FFT and DFT, with just a few differences to note.  One is that DFT uses a signed integer length while FFT uses a power of 2 order.   The other is that  ippsFFTInit returns an aligned pointer to the spec.  When buffers are allocated with Intel® IPP malloc, they are already aligned so this will be the same address as the original spec buffer.

DFT: (Use this for a general implementation, power of 2 and any other size.)


//Set the size
const int N=128;

// Spec and working buffers
IppsDFTSpec_C_32fc *pDFTSpec=0;
Ipp8u  *pDFTInitBuf, *pDFTWorkBuf;

// Allocate complex buffers
Ipp32fc *pSrc=ippsMalloc_32fc(N);
Ipp32fc *pDst=ippsMalloc_32fc(N);

// Query to get buffer sizes
int sizeDFTSpec,sizeDFTInitBuf,sizeDFTWorkBuf;
ippsDFTGetSize_C_32fc(N, IPP_FFT_NODIV_BY_ANY,
ippAlgHintAccurate, &sizeDFTSpec, &sizeDFTInitBuf, &sizeDFTWorkBuf);

// Alloc DFT buffers
pDFTSpec    = (IppsDFTSpec_C_32fc*)ippsMalloc_8u(sizeDFTSpec);
pDFTInitBuf = ippsMalloc_8u(sizeDFTInitBuf);
pDFTWorkBuf = ippsMalloc_8u(sizeDFTWorkBuf);

// Initialize DFT
ippsDFTInit_C_32fc(N, IPP_FFT_NODIV_BY_ANY,
ippAlgHintAccurate, pDFTSpec, pDFTInitBuf);
if (pDFTInitBuf) ippFree(pDFTInitBuf);

// Do the DFT
ippsDFTFwd_CToC_32fc(pSrc,pDst,pDFTSpec,pDFTWorkBuf);

//check results
ippsDFTInv_CToC_32fc(pDst,pDst,pDFTSpec,pDFTWorkBuf);
int OK=1;
for (int i=0;i<N;i++) {
pDst[i].re/=(Ipp32f)N;
pDst[i].im/=(Ipp32f)N;
if ((abs(pSrc[i].re-pDst[i].re)>.001) ||
(abs(pSrc[i].im-pDst[i].im)>.001) )
{
OK=0;break;
}
}
puts(OK==1?"DFT OK":"DFT Fail");

if (pDFTWorkBuf) ippFree(pDFTWorkBuf);
if (pDFTSpec) ippFree(pDFTSpec);

ippFree(pSrc);
ippFree(pDst);

...


FFT: (Special power of 2 only case, skips a few checks so may be minimally faster than calling DFT.)


//Set the size
const int N=128;
const int order=(int)(log((double)N)/log(2.0));

// Spec and working buffers
IppsFFTSpec_C_32fc *pFFTSpec=0;
Ipp8u *pFFTSpecBuf, *pFFTInitBuf, *pFFTWorkBuf;

// Allocate complex buffers
Ipp32fc *pSrc=ippsMalloc_32fc(N);
Ipp32fc *pDst=ippsMalloc_32fc(N);

// Query to get buffer sizes
int sizeFFTSpec,sizeFFTInitBuf,sizeFFTWorkBuf;
ippsFFTGetSize_C_32fc(order, IPP_FFT_NODIV_BY_ANY,
ippAlgHintAccurate, &sizeFFTSpec, &sizeFFTInitBuf, &sizeFFTWorkBuf);

// Alloc FFT buffers
pFFTSpecBuf = ippsMalloc_8u(sizeFFTSpec);
pFFTInitBuf = ippsMalloc_8u(sizeFFTInitBuf);
pFFTWorkBuf = ippsMalloc_8u(sizeFFTWorkBuf);

// Initialize FFT
ippsFFTInit_C_32fc(&pFFTSpec, order, IPP_FFT_NODIV_BY_ANY,
ippAlgHintAccurate, pFFTSpecBuf, pFFTInitBuf);
if (pFFTInitBuf) ippFree(pFFTInitBuf);

// Do the FFT
ippsFFTFwd_CToC_32fc(pSrc,pDst,pFFTSpec,pFFTWorkBuf);

//check results
ippsFFTInv_CToC_32fc(pDst,pDst,pFFTSpec,pFFTWorkBuf);
int OK=1;
for (int i=0;i<N;i++) {
pDst[i].re/=(Ipp32f)N;
pDst[i].im/=(Ipp32f)N;
if ((abs(pSrc[i].re-pDst[i].re)>.001) ||
(abs(pSrc[i].im-pDst[i].im)>.001) )
{
OK=0;break;
}
}
puts(OK==1?"FFT OK":"FFT Fail");

if (pFFTWorkBuf) ippFree(pFFTWorkBuf);
if (pFFTSpecBuf) ippFree(pFFTSpecBuf);

ippFree(pSrc);
ippFree(pDst);

...


Error/status checks are omitted above for shorter presentation.  However, especially for larger signal sizes, checking the status of the GetSize and Init steps can be important.  There is no simple formula to determine the largest supported size.