Integration Wrappers for Intel® Integrated Performance Primitives (Intel® IPP)

Published: 08/31/2017  

Last Updated: 09/15/2017

By Chao Yu

To provide easy-to-use APIs and reduce the effort required to add Intel® Integrated Performance Primitives (Intel® IPP) functions to your application, Intel® IPP library introduces new Integration Wrappers APIs. These APIs aggregate multiple Intel® IPP functions and provide easy interfaces to support external threading of Intel® IPP functions. 

Integration Wrappers consist of C and C++ interfaces:

  • C interface aggregates Intel IPP functions of similar functionality with various data types and channels into one function. Initialization steps required by several Intel IPP functions are implemented in one initialization function for each functionality. To reduce the size of your code and save time required for integration, the wrappers handle all memory management and Intel IPP function selection routines.
  • C++ interface wraps around the C interface to provide default parameters, easily initialized objects as parameters, exception handling, and objects for complex Intel IPP functions with automatic memory management for specification structures.

Integration Wrappers are available as a separate download in the form of source and pre-built binaries.

1. Intel® IPP Integration Wrappers Overview

1.1 Key Features

Integration Wrappers simplify usage of Intel IPP functions and address some of the advanced use cases of Intel IPP. They consist of the C and C++ APIs which provide the following key features:

C interface provides compatibility with C libraries and applications and enables you to use the following features of Integration Wrappers:

  • Automatic selection of the proper Intel IPP function based on input parameters
  • Automatic handling of temporary memory allocations for Intel IPP functions
  • Improved tiling handling and automatic borders processing for tiles
  • Memory optimizations for threading

C++ interface additionally provides:

  • Easier to use classes like IwSize (image size structure in Integration Wrappers)  instead of IppiSize (image size structure in Intel IPP functions), IwRect instead of IppiRect, and IwValue as a unified scalar parameter for borders and other per-channel input values
  • Complex Intel IPP functions designed as classes to use automatic construction and destruction features

The following two code examples implement the image resizing operation with two APIs 1) Resizing image with the Intel® IPP functions 2) Resizing image using the Intel® IPP Integration Wrappers APIs.  The second implementation is much simpler requires less effort to use the Intel IPP functions.

1.Image Resizing with Intel® IPP functions

{    
     ……   
     ippSts = ippiResizeGetSize_8u(srcSize, dstSize, ippLinear, 0, &specSize, &initSize);
     if(ippSts < 0) return ippSts;
     //allocate internal buffer
     pSpec = (IppiResizeSpec_32f*)ippsMalloc_8u(specSize);
     if(specSize && !pSpec) return STS_ERR_ALLOC;    
     //allocate initialization buffer
     pInitBuf = ippsMalloc_8u(initSize);
     //init ipp resizer
     ippSts = ippiResizeLinearInit_8u(srcSize, dstSize, pSpec);
     ippSts = ippiResizeGetSrcRoi_8u(pSpec, dstRoiOffset, dstRoiSize, &srcRoiOffset, &srcRoiSize);
     // adjust input and output buffers to current ROI
     unsigned char *pSrcPtr = pSrc + srcRoiOffset.y*srcStep + rcRoiOffset.x*CHANNELS;
     unsigned char *pDstPtr = pDst + dstRoiOffset.y*dstStep + dstRoiOffset.x*CHANNELS;
     ippSts = ippiResizeGetBufferSize_8u(pSpec, dstRoiSize, CHANNELS, &bufferSize);
     pBuffer = ippsMalloc_8u(bufferSize);
     // perform resize
     ippSts = ippiResizeLinear_8u_C1R(pSrcPtr, srcStep, pDstPtr, dstStep, dstRoiOffset, dstRoiSize, ippBorderRepl, 0, pSpec, pBuffer);
     .......   
}

2. Image Resize with Intel® IPP Integration Wrappers (C++ interface)

{ ......
      //Initialization
      IppDataType dataType = ImageFormatToIpp(src.m_sampleFormat);
      ipp::IwiSize srcSize(ImageSizeToIpp(src.m_size));
      ipp::IwiSize dstSize(ImageSizeToIpp(dst.m_size));
      m_resize.InitAlloc(srcSize, dstSize, dataType, src.m_samples, interpolation, ipp::IwiResizeParams(), ippBorderRepl);
      //Run 
      ipp::IwiImage iwSrc = ImageToIwImage(src);
      ipp::IwiImage iwDst = ImageToIwImage(dst);
      ipp::IwiRect rect((int)roi.x, (int)roi.y, (int)roi.width, (int)roi.height);
      ipp::IwiRoi  iwRoi(rect);
      m_resize(&iwSrc, &iwDst, &iwRoi);
  ......
}

1.2 Threading

The API of Integration Wrappers (IW) is designed to simplify tile-based processing of images. Tiling is based on the concept of region of interest (ROI).
Most IW image processing functions operate not only on whole images but also on image areas - ROIs. Image ROI is a rectangular area that is either some part of the image or the whole image.

The sections below explain the following IW tiling techniques:

Manual tiling

IW functions are designed to be tiled using the IwiRoi interface. But if for some reasons automatic tiling with IwiRoi is not suitable, there are special APIs to perform tiling manually.

When using manual tiling you need to:

  • Shift images to a correct position for a tile using iwiImage_GetRoiImage
  • If necessary, pass correct border InMem flags to a function using iwiRoi_GetTileBorder
  • If necessary, check the filter border around the image border using iwiRoi_CorrectBorderOverlap

Here is an example of IW threading with OpenMP* using manual tiling:

#include "iw++/iw.hpp"
#include <omp.h>

int main(int, char**)
{
    // Create images
    ipp::IwiImage srcImage, cvtImage, dstImage;
    srcImage.Alloc(ipp::IwiSize(320, 240), ipp8u, 3);
    cvtImage.Alloc(srcImage.m_size, ipp8u, 1);
    dstImage.Alloc(srcImage.m_size, ipp16s, 1);
    int threads = omp_get_max_threads(); // Get threads number
    ipp::IwiSize   tileSize(dstImage.m_size.width, (dstImage.m_size.height + threads - 1)/threads); // One tile per thread
    IppiBorderSize sobBorderSize = iwiSizeToBorderSize(iwiMaskToSize(ippMskSize3x3)); // Convert mask size to border size
    #pragma omp parallel num_threads(threads)
    {
        // Declare thread-scope variables
        IppiBorderType border;
        ipp::IwiImage srcTile, cvtTile, dstTile;
        // Color convert threading
        #pragma omp for
        for(IppSizeL row = 0; row < dstImage.m_size.height; row += tileSize.height)
        {
            ipp::IwiRect tile(0, row, tileSize.width, tileSize.height); // Create actual tile rectangle
            // Get images for current ROI
            srcTile = srcImage.GetRoiImage(tile);
            cvtTile = cvtImage.GetRoiImage(tile);
            // Run functions
            ipp::iwiColorConvert_RGB(&srcTile, iwiColorRGB, &cvtTile, iwiColorGray);
        }
        // Sobel threading
        #pragma omp for
        for(IppSizeL row = 0; row < dstImage.m_size.height; row += tileSize.height)
        {
            ipp::IwiRect tile(0, row, tileSize.width, tileSize.height); // Create actual tile rectangle
            iwiRoi_CorrectBorderOverlap(sobBorderSize, cvtImage.m_size, &tile); // Check borders overlap and correct tile of necessary
            border = iwiRoi_GetTileBorder(ippBorderRepl, sobBorderSize, cvtImage.m_size, tile); // Get actual tile border
            // Get images for current ROI
            cvtTile = cvtImage.GetRoiImage(tile);
            dstTile = dstImage.GetRoiImage(tile);
            // Run functions
            ipp::iwiFilterSobel(&cvtTile, &dstTile, iwiDerivHorFirst, ippMskSize3x3, border);
        }
    }
}

Basic tiling

You can use basic tiling to tile or thread one standalone function or a group of functions without borders. To apply basic tiling, initialize the IwiRoi structure with the current tile rectangle and pass it to the processing function.

For functions operating with different sizes for source and destination images, use the destination size as a base for tile parameters.

Here is an example of IW threading with OpenMP* using basic tiling with IwiRoi:

#include "iw++/iw.hpp"
#include <omp.h>

int main(int, char**)
{
    // Create images
    ipp::IwiImage srcImage, cvtImage, dstImage;
    srcImage.Alloc(ipp::IwiSize(320, 240), ipp8u, 3);
    cvtImage.Alloc(srcImage.m_size, ipp8u, 1);
    dstImage.Alloc(srcImage.m_size, ipp16s, 1);

    int            threads = omp_get_max_threads(); // Get threads number
    ipp::IwiSize   tileSize(dstImage.m_size.width, (dstImage.m_size.height + threads - 1)/threads); // One tile per thread

    #pragma omp parallel num_threads(threads)
    {
        // Declare thread-scope variables
        ipp::IwiRoi  roi;

        // Color convert threading
        #pragma omp for
        for(IppSizeL row = 0; row < dstImage.m_size.height; row += tileSize.height)
        {
            roi = ipp::IwiRect(0, row, tileSize.width, tileSize.height); // Initialize IwiRoi with current tile rectangle

            // Run functions
            ipp::iwiColorConvert_RGB(&srcImage, iwiColorRGB, &cvtImage, iwiColorGray, IPP_MAXABS_64F, &roi);
        }

        // Sobel threading
        #pragma omp for
        for(IppSizeL row = 0; row < dstImage.m_size.height; row += tileSize.height)
        {
            roi = ipp::IwiRect(0, row, tileSize.width, tileSize.height); // Initialize IwiRoi with current tile rectangle

            // Run functions
            ipp::iwiFilterSobel(&cvtImage, &dstImage, iwiDerivHorFirst, ippMskSize3x3, ippBorderRepl, 0, &roi);
        }
    }
}

Pipeline tiling

With the IwiRoi interface you can easily tile pipelines by applying a current tile to an entire pipeline at once instead of tiling each function one by one. This operation requires borders handling and tracking pipeline dependencies, which increases complexity of the API. But when used properly, pipeline tiling can increase scalability of threading or performance of non-threaded functions by performing all operations inside the CPU cache.

Here are some important details that you should take into account when performing pipeline tiling:

  1. Pipeline tiling is performed in reverse order: from destination to source, therefore:
    • Use the tile size based on the destination image size
    • Initialize the IwiRoi structure with the IwiRoiPipeline_Init for the last operation
    • Initialize the IwiRoi structure for other operations from the last to the first with IwiRoiPipeline_InitChild
  2. Obtain the border size for each operation from its mask size, kernel size, or using the specific function returning the border size, if any.
  3. If you have a geometric transform inside the pipeline, fill in the IwiRoiScale structure for IwiRoi for this transform operation.
  4. In case of threading, copy initialized IwiRoi structures to a local thread or initialize them on a per-thread basis. Access to structures is not thread-safe.
  5. Do not exceed the maximum tile size specified during initialization. Otherwise, this can lead to buffers overflow.

The IW package contains several advanced tiling examples, which can help you understand the details of the process. For more information on how to find and use these examples, please download package and view contained developer reference for Integration Wrappers for Intel IPP.

The following example demonstrates IW threading with OpenMP* using IwiRoi pipeline tiling:

#include "iw++/iw.hpp"
#include <omp.h>

int main(int, char**)
{
    // Create images
    ipp::IwiImage srcImage, dstImage;
    srcImage.Alloc(ipp::IwiSize(320, 240), ipp8u, 3);
    dstImage.Alloc(srcImage.m_size, ipp16s, 1);
    int threads = omp_get_max_threads(); // Get threads number
    ipp::IwiSize   tileSize(dstImage.m_size.width, (dstImage.m_size.height + threads - 1)/threads); // One tile per thread
    IppiBorderSize sobBorderSize = iwiSizeToBorderSize(iwiMaskToSize(ippMskSize3x3)); // Convert mask size to border size

    #pragma omp parallel num_threads(threads)
    {
        // Declare thread-scope variables
        ipp::IwiImage       cvtImage;
        ipp::IwiRoiPipeline roiConvert, roiSobel;
        roiSobel.Init(tileSize, dstImage.m_size, &sobBorderSize); // Initialize last operation ROI first
        roiConvert.InitChild(&roiSobel); // Initialize next operation as a dependent
        // Allocate intermediate buffer
        cvtImage.Alloc(roiConvert.GetDstBufferSize(), ipp8u, 1);
        // Joined pipeline threading
        #pragma omp for
        for(IppSizeL row = 0; row < dstImage.m_size.height; row += tileSize.height)
        {
            roiSobel.SetTile(ipp::IwiRect(0, row, tileSize.width, tileSize.height)); // Set IwiRoi chain to current tile coordinates
            // Run functions
            ipp::iwiColorConvert_RGB(&srcImage, iwiColorRGB, &cvtImage, iwiColorGray, IPP_MAXABS_64F, &roiConvert);
            ipp::iwiFilterSobel(&cvtImage, &dstImage, iwiDerivHorFirst, ippMskSize3x3, ippBorderRepl, 0, &roiSobel);
        }
    }
}

2. Getting Started

To learn more about Integration Wrappers, please refer to the developer guide here.

3. Support

If you have any problems with Intel® IPP Integration Wrappers, post your questions at Intel® IPP forum.  If you already register your Intel® software product at the Intel® Software Development Products Registration Center, you can also submit your question by Intel® Premier Support.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.