Intel® oneAPI Math Kernel Library Cookbook

ID 758503
Date 9/27/2021

A newer version of this document is available. Customers should click here to go to the newest version.

Using a histospline technique to scale images


Rescale color or grayscale images using a histospline technique.


Use oneMKL data fitting functions for image scaling and spline interpolation for the computation of the histospline.

As described in [Bosner14], a one-dimensional histospline can be applied to the problem of two-dimensional image scaling. The application of one-dimensional interpolation primitives to two-dimensional (surface) interpolation and approximation is described in [Zavyalov80].

Consider the problem of scaling an input image of size n1*m1 pixels, where n1 is the number of rows and m1 is the number of columns, to an image of n2*m2 pixels as a two-dimensional problem that can be reduced to a number of one-dimensional histopolation problems as follows.

For each color plane c of the input image:

  1. For each row of the input image:

    1. For each pixel in the current row, compute accumulated cell averages (or color values) and store them in the current row in array VX.

    2. Consider the array as interpolated function values, in (m1+1) breakpoints x_breaks[] uniformly distributed over [0; m1]. Construct a histospline using a natural cubic spline, compute its derivatives, and store them in the current row of array VXR, i=0..m2 in (m2+1) sites x_sites[] uniformly distributed over [0; m1]:

  2. Transpose array VXR to array VXRT.

  3. Similar to VX, do the same operations row-by-row for VY to get results in arrayVYR:

    1. Similar to step 1.a., compute array VY as accumulated sums of the values from VXRT just computed.


      This step can be performed simultaneously with the transposition of VXR, so it is not necessary to store VXRT.

    2. Store derivatives in VYR array.

  4. Transpose VYR to VR to get integer results and store it to output image color plane c.


    It is not necessary to store VR, and the extra last row and column are discarded.

Source code: see the image_scaling folder in the samples archive available at

Image Scaling Using oneMKL Data Fitting Functions

    for( c=0; c<nc; c++ ) {
        /* 1) PIXELS matrix n1*m1 --> to VX matrix n1*(m1+1) */
        for( y1=0; y1<n1; y1++ ) {
            /* 1.a) get VX as accumulated sums of pixel intencities */
            for( x1=0, s=0; x1<=m1; x1++ ) {
        for( y1=0; y1<n1; y1++ ) {
            /* 1.b) get only derivatives */
            interpolate_der(m1+1,x_breaks, 1,&VX[y1*(m1+1)+0], m2+1,x_sites, &VXR[y1*(m2+1)+0]);
        /* 2) transpose VXR to VXRT not needed (can be skipped) */
        /* 3) */
        for( x2=0; x2<=m2; x2++ ) {
            /* 3.a) transpose VXR; and get VY as accumulated sums of just computed values */
            for( y1=0,fs=0.0; y1<=n1; y1++ ) {
                VY[x2*(n1+1)+y1]=fs; fs+=VXR[y1*(m2+1)+x2];
        /* 3.b) get only derivatives */
        for( x2=0; x2<=m2; x2++ ) {
            interpolate_der(n1+1,y_breaks, 1,&VY[x2*(n1+1)+0], n2+1,y_sites, &VYR[x2*(n2+1)+0]);
        /* 4) transpose VYR to VR and get integer result (VR not needed to store) */
        for( y2=0; y2<n2; y2++ ) { /* not using last row */
           for( x2=0; x2<m2; x2++ ) { /* not using last column */
               fptype v;
               int    i;
               /* add 0.5 for rounding to nearest during next conversion to integer */
               v = v + 0.5f;
               i = (int)v;
               /* saturation */
               if(i<0) i=0;
               /* convert to integer and save to color plane c of output image pixel */

Spline Computation Using oneMKL Data Fitting Functions

The interpolate_der function uses spline interpolation to compute the histospline.

It uses oneMKL data fitting routines to compute natural cubic spline with free-end boundary conditions for a given ny = 1 rows of function values f[] in nx breakpoints x[], then computes its derivatives in nsite sites xx[] and outputs this resulting row of nsite values of function derivatives to r[].

void interpolate_der(int nx, fptype* x,    int ny, fptype* f,    int nsite, fptype* xx,    fptype* r) {
    /* ... */
    if(scoeff==NULL) {
        printf("Error: not enough memory for scoeff.\n");

    errorCode = NewTask1D(&interpTask, nx,x,xhint, ny,f, yhint);
    if(errorCode)printf("NewTask1D errorCode=%d\n",errorCode);

    errorCode = EditPPSpline1D(interpTask,
        DF_BC_FREE_END, NULL, /* Free-end boundary condition. */
        DF_NO_IC, NULL,       /* No internal conditions. */
        scoeff, DF_NO_HINT);
    if(errorCode)printf("EditPPSpline1D errorCode=%d\n",errorCode);

    errorCode = Construct1D(interpTask, 0, 0);
    if(errorCode)printf("Construct1D errorCode=%d\n",errorCode);

    errorCode = Interpolate1D(interpTask, DF_INTERP, ny, nsite,xx, siteHint, der_orders_sz,der_orders, datahint,r,rhint, nxx_cell_indexes);
    if(errorCode)printf("Interpolate1D errorCode=%d\n",errorCode);

    errorCode = dfDeleteTask(&interpTask);
    if(errorCode)printf("dfDeleteTask errorCode=%d\n",errorCode);



The example calls the interpolate_der function in a loop for each row of input image (and for each row of intermediate image), so the ny parameter of interpolate_der is 1, representing a single interpolated function (or image row). But data fitting routines support ny > 1. This means that the example can be rewritten to replace the loop of calls to interpolate_der by one call with the ny parameter set to the total number of image rows to process.


When the image is large enough, automatic parallelization is done inside the data fitting construction and interpolation routines.

When ny = 1, it can make sense to parallelize the loop of calls to interpolate_der using #pragma omp parallel for, but it is better to rely on automatic parallelization inside the data fitting routines. Theoretically, this pragma could also be added over the c loop, since the algorithm for each color plane is independent from other color planes. However, this can cause even degradations because different threads can access one-byte sized fragments (the RGB components of same pixel) within same line of memory. Additionally, relative parallelization overhead can be too large for images which are too small. One way to avoid degradations is to break the image data into blocks. In any case, parallelization only makes sense for large enough amounts of data.

Additionally, take care using #pragma simd for vectorization.


External libraries can supply primitives for loading and saving image files. The sample code uses libpng for Linux* and OS X*, and the Microsoft Foundation Class Library CImage class for Windows*.

Sample input images are located in image_scaling/png_input folder, and resulting output images are located in image_scaling/png_output.


Sample input

Scaled using histospline technique


To evaluate the result of scaling the images, refer to the actual output images in image_scaling/​png_output.

Routines Used



Creates and initializes a new task descriptor for a one-dimensional Data Fitting task.


Modifies spline order, type, and boundary conditions parameters representing the spline in the Data Fitting task descriptor.


Constructs natural cubic spline.


Performs data fitting interpolation and computation of spline derivatives at given sites.


Destroys a Data Fitting task object and frees the memory.


Allocates memory buffers aligned on 64-byte boundaries.


Frees memory allocated by mkl_malloc.