Intel IPP Integration Wrappers Developer Guide and Reference

ID 751823
Date 1/18/2023
Public
Document Table of Contents

Tiling and Threading

The API of Integration Wrappers (IW) is designed to simplify tile-based processing of images. Tiling is based on the concept of region of interest (ROI).

Most IW image processing functions operate not only on whole images but also on image areas - ROIs. Image ROI is a rectangular area that is either some part of the image or the whole image.

ROI of an image is defined by the size and offset from the image origin, as shown in the figure below. The origin of an image is in the top left corner, with x values increasing from left to right and y values increasing downwards.



Borders Overlapping

Image filters use the borders concept to correctly process image pixels around the current pixel. A filter kernel can be applied to pixels that are outside of image boundaries, and the function must either extrapolate pixels using one of the border extrapolation methods (replicate, mirror, etc.) or use pixels from memory if the image border physically exists in memory.

Borders can complicate tiling, because for each tile you need to apply proper border InMem flags according to the current tile position relative to the image. If the filter border size is greater than 1 pixel, for some tile positions filter and image borders can overlap, which means that the filter border can be inside and outside of the image at the same time. Intel IPP functions do not support input with undefined borders, in such cases filtering may result in distorted pixels around the image borders.



Overlapping may happen only if the filter border size is more than 1 pixel and the following conditions are true:

  • For left and top borders: tile_size < border_size
  • For right and bottom borders: (image_size%tile_size > 0) && (image_size%tile_size < border_size)

You can ignore overlapped borders if you do not need the bit-exact quality of tiling around image boundaries. But to provide the same result as without tiling, you must tune the tile size manually to avoid overlapping or use special Integration Wrappers APIs, which can handle this problem for you. For more details, see the sections below.

The sections below explain the following IW tiling techniques:

Manual tiling

IW functions are designed to be tiled using the IwiTile and IwsTile interfaces for image and signal functions, respectively. But if for some reasons automatic tiling with IwiTile is not suitable, there are special APIs to perform tiling manually.

When using manual tiling you need to:

  • Shift images to a correct position for a tile using iwiImage_GetRoiImage
  • If necessary, pass correct border InMem flags to a function using iwiTile_GetTileBorder
  • If necessary, check the filter border around the image border using iwiTile_CorrectBordersOverlap

Here is an example of IW threading with OpenMP* using manual tiling:

#include <iostream>

#include "iw++/iw.hpp"
#ifdef _OPENMP
#include <omp.h>
#endif

int main(int, char**)
{
    int fail = 0;

    // Create images
    ipp::IwiImage srcImage, cvtImage, dstImage;
    srcImage.Alloc(ipp::IwiSize(320, 240), ipp8u, 3);
    cvtImage.Alloc(srcImage.m_size, ipp8u, 1);
    dstImage.Alloc(srcImage.m_size, ipp16s, 1);

#ifdef _OPENMP
    int                 threads = omp_get_max_threads(); // Get threads number
#else
    int                 threads = 4;                     // Just divide to porcess by tiles
#endif
    ipp::IwiSize        tileSize(dstImage.m_size.width, (dstImage.m_size.height + threads - 1)/threads); // One tile per thread
    ipp::IwiBorderSize  sobBorderSize = iwiSizeToBorderSize(iwiMaskToSize(ippMskSize3x3)); // Convert mask size to border size
    ipp::IwiBorderType  border = ippBorderRepl;

#ifdef _OPENMP
    #pragma omp parallel num_threads(threads)
#endif
    {
        // Declare thread-scope variables
        ipp::IwiBorderType threadBorder;
        ipp::IwiImage srcTile, cvtTile, dstTile;

        try
        {
            // Color convert threading
#ifdef _OPENMP
            #pragma omp for
#endif
            for(ipp::IwSize row = 0; row < dstImage.m_size.height; row += tileSize.height)
            {
                ipp::IwiRoi tile(0, row, tileSize.width, tileSize.height); // Create actual tile rectangle

                // Get images for current ROI
                srcTile = srcImage.GetRoiImage(tile);
                cvtTile = cvtImage.GetRoiImage(tile);

                // Run functions
                ipp::iwiColorConvert(srcTile, iwiColorRGB, cvtTile, iwiColorGray);
            }

            // Sobel threading
#ifdef _OPENMP
            #pragma omp for
#endif
            for(ipp::IwSize row = 0; row < dstImage.m_size.height; row += tileSize.height)
            {
                ipp::IwiRoi tile(0, row, tileSize.width, tileSize.height); // Create actual tile rectangle
                ipp::IwiTile::CorrectBordersOverlap(tile, border, sobBorderSize, cvtImage.m_size); // Check borders overlap and correct tile of necessary
                threadBorder = ipp::IwiTile::GetTileBorder(tile, border, sobBorderSize, cvtImage.m_size); // Get actual tile border

                // Get images for current ROI
                cvtTile = cvtImage.GetRoiImage(tile);
                dstTile = dstImage.GetRoiImage(tile);

                // Run functions
                ipp::iwiFilterSobel(cvtTile, dstTile, iwiDerivHorFirst, ippMskSize3x3, ipp::IwDefault(), threadBorder);
            }
        }
        catch(...)
        {
            fail = 1;
        }
    }

    if(fail)
    {
        std::cout << "Failure!\n";
        return 1;
    }
    std::cout << "Success!\n";
    return 0;
}

IMPORTANT:
Several simplified IW versions of complex Intel IPP functions cannot be tiled manually because of interface limitations. If such limitation exists, it is specified in a function entry in a header file and in the reference section of this document.

IwiTile-based tiling

IwiTile is a main interface structure for tiling in IW. This interface has two associated APIs:

  • Basic tiling API with the iwiTile_ prefix
  • Pipeline tiling API with the iwiTilePipeline_ prefix

Most IW image processing functions have the IwiTile parameter. For example, see the API of the iwiFilterSobel function:

iwiFilterSobel(
    const IwiImage             *pSrcImage,
    IwiImage                   *pDstImage,
    IwiDerivativeType           opType,
    IppiMaskSize                kernelSize,
    const IwiFilterSobelParams *pAuxParams,
    IwiBorderType               border,
    const Ipp64f               *pBorderVal,
    const IwiTile              *pTile
);

  • pSrcImage and pDstImage are initialized with the size of the whole source and destination images accordingly
  • pTile is a pointer to the IwiTile structure. You do not need to shift input/output buffers and check borders manually. The IwiTile initialization function and processing function will place input and output buffers automatically. If you do not need to use tiling, pass NULL to pTile, and the whole image will be processed at once.

If a function does not have the IwiTile parameter, it means that the function cannot be tiled because of algorithmic limitations. You can use manual tiling for such functions, but it may produce incorrect results.

Basic tiling

You can use basic tiling to tile or thread one standalone function or a group of functions without borders. To apply basic tiling, initialize the IwiTile structure with the current tile ROI and pass it to the processing function.

For functions operating with different sizes for source and destination images, use the destination size as a base for tile parameters.

Here is an example of IW threading with OpenMP* using basic tiling with IwiTile:

#include <iostream>

#include "iw++/iw.hpp"
#ifdef _OPENMP
#include <omp.h>
#endif

int main(int, char**)
{
    int fail = 0;

    // Create images
    ipp::IwiImage srcImage, cvtImage, dstImage;
    srcImage.Alloc(ipp::IwiSize(320, 240), ipp8u, 3);
    cvtImage.Alloc(srcImage.m_size, ipp8u, 1);
    dstImage.Alloc(srcImage.m_size, ipp16s, 1);

#ifdef _OPENMP
    int                threads = omp_get_max_threads(); // Get threads number
#else
    int                threads = 4;                     // Just divide to porcess by tiles
#endif
    ipp::IwiSize       tileSize(dstImage.m_size.width, (dstImage.m_size.height + threads - 1)/threads); // One tile per thread
    ipp::IwiBorderType border = ippBorderRepl;

#ifdef _OPENMP
    #pragma omp parallel num_threads(threads)
#endif
    {
        // Declare thread-scope variables
        ipp::IwiRoi  roi;

        try
        {
            // Color convert threading
#ifdef _OPENMP
            #pragma omp for
#endif
            for(ipp::IwSize row = 0; row < dstImage.m_size.height; row += tileSize.height)
            {
                // Run functions with the current tile rectangle
                ipp::iwiColorConvert(srcImage, iwiColorRGB, cvtImage, iwiColorGray, IwValueMax, ipp::IwDefault(), ipp::IwiRoi(0, row, tileSize.width, tileSize.height));
            }

            // Sobel threading
#ifdef _OPENMP
            #pragma omp for
#endif
            for(ipp::IwSize row = 0; row < dstImage.m_size.height; row += tileSize.height)
            {
                // Run functions with the current tile rectangle
                ipp::iwiFilterSobel(cvtImage, dstImage, iwiDerivHorFirst, ippMskSize3x3, ipp::IwDefault(), border, ipp::IwiRoi(0, row, tileSize.width, tileSize.height));
            }
        }
        catch(...)
        {
            fail = 1;
        }
    }

    if(fail)
    {
        std::cout << "Failure!\n";
        return 1;
    }
    std::cout << "Success!\n";
    return 0;
}

Pipeline tiling

With the IwiTile interface you can easily tile pipelines by applying a current tile to an entire pipeline at once instead of tiling each function one by one. This operation requires borders handling and tracking pipeline dependencies, which increases complexity of the API. But when used properly, pipeline tiling can increase scalability of threading or performance of non-threaded functions by performing all operations inside the CPU cache.

Here are some important details that you should take into account when performing pipeline tiling:

  1. Pipeline tiling is performed in reverse order: from destination to source, therefore:
    • Use the tile size based on the destination image size
    • Initialize the IwiTile structure with the IwiTilePipeline_Init for the last operation
    • Initialize the IwiTile structure for other operations from the last to the first with IwiTilePipeline_InitChild
  2. Obtain the border size for each operation from its mask size, kernel size, or using the specific function returning the border size, if any.
  3. In case of threading, copy initialized IwiTile structures to a local thread or initialize them on a per-thread basis. Access to structures is not thread-safe.
  4. Do not exceed the maximum tile size specified during initialization. Otherwise, this can lead to buffers overflow.

The following example demonstrates IW threading with OpenMP* using IwiTile pipeline tiling.

#include <iostream>

#include "iw++/iw.hpp"
#ifdef _OPENMP
#include <omp.h>
#endif

int main(int, char**)
{
    int fail = 0;

    // Create images
    ipp::IwiImage srcImage, dstImage;
    srcImage.Alloc(ipp::IwiSize(320, 240), ipp8u, 3);
    dstImage.Alloc(srcImage.m_size, ipp16s, 1);

#ifdef _OPENMP
    int                threads = omp_get_max_threads(); // Get threads number
#else
    int                threads = 4;                     // Just divide to porcess by tiles
#endif
    ipp::IwiSize       tileSize(dstImage.m_size.width, (dstImage.m_size.height + threads - 1)/threads); // One tile per thread
    ipp::IwiBorderSize sobBorderSize = iwiSizeToBorderSize(iwiMaskToSize(ippMskSize3x3)); // Convert mask size to border size
    ipp::IwiBorderType border = ippBorderRepl;

#ifdef _OPENMP
    #pragma omp parallel num_threads(threads)
#endif
    {
        // Declare thread-scope variables
        ipp::IwiImage        cvtImage;
        ipp::IwiTilePipeline roiConvert, roiSobel;

        try
        {
            roiSobel.Init(tileSize, dstImage.m_size, border, sobBorderSize); // Initialize last operation ROI first
            roiConvert.InitChild(roiSobel); // Initialize next operation as a dependent

            // Allocate intermediate buffer
            cvtImage.Alloc(roiConvert.GetDstBufferSize(), ipp8u, 1);

            // Joined pipeline threading
#ifdef _OPENMP
            #pragma omp for
#endif
            for(ipp::IwSize row = 0; row < dstImage.m_size.height; row += tileSize.height)
            {
                roiSobel.SetTile(ipp::IwiRoi(0, row, tileSize.width, tileSize.height)); // Set IwiRoi chain to current tile coordinates

                // Run functions
                ipp::iwiColorConvert(srcImage, iwiColorRGB, cvtImage, iwiColorGray, IwValueMax, ipp::IwDefault(), roiConvert);
                ipp::iwiFilterSobel(cvtImage, dstImage, iwiDerivHorFirst, ippMskSize3x3, ipp::IwDefault(), border, roiSobel);
            }
        }
        catch(...)
        {
            fail = 1;
        }
    }

    if(fail)
    {
        std::cout << "Failure!\n";
        return 1;
    }
    std::cout << "Success!\n";
    return 0;
}

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201