Video and Vision Processing Suite Intel® FPGA IP User Guide

ID 683329
Date 9/30/2022

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents
1. About the Video and Vision Processing Suite 2. Getting Started with the Video and Vision Processing IPs 3. Video and Vision Processing IPs Functional Description 4. Video and Vision Processing IP Interfaces 5. Video and Vision Processing IP Registers 6. Video and Vision Processing IPs Software Programming Model 7. Protocol Converter Intel® FPGA IP 8. 3D LUT Intel® FPGA IP 9. AXI-Stream Broadcaster Intel® FPGA IP 10. Chroma Key Intel® FPGA IP 11. Chroma Resampler Intel® FPGA IP 12. Clipper Intel® FPGA IP 13. Clocked Video Input Intel® FPGA IP 14. Clocked Video to Full-Raster Converter Intel® FPGA IP 15. Clocked Video Output Intel® FPGA IP 16. Color Space Converter Intel® FPGA IP 17. Deinterlacer Intel® FPGA IP 18. FIR Filter Intel® FPGA IP 19. Frame Cleaner Intel® FPGA IP 20. Full-Raster to Clocked Video Converter Intel® FPGA IP 21. Full-Raster to Streaming Converter Intel® FPGA IP 22. Generic Crosspoint Intel® FPGA IP 23. Genlock Signal Router Intel® FPGA IP 24. Guard Bands Intel® FPGA IP 25. Interlacer Intel® FPGA IP 26. Mixer Intel® FPGA IP 27. Pixels in Parallel Converter Intel® FPGA IP 28. Scaler Intel® FPGA IP 29. Stream Cleaner Intel® FPGA IP 30. Switch Intel® FPGA IP 31. Tone Mapping Operator Intel® FPGA IP 32. Test Pattern Generator Intel® FPGA IP 33. Video Frame Buffer Intel® FPGA IP 34. Video Streaming FIFO Intel® FPGA IP 35. Video Timing Generator Intel® FPGA IP 36. Warp Intel® FPGA IP 37. Design Security 38. Document Revision History for Video and Vision Processing Suite User Guide

28.3. Scaler IP Functional Description

The scaler supports three options for the algorithm that resizes the video fields: nearest neighbor, bilinear and polyphase.

Nearest neighbor

Nearest neighbor is the lowest cost and lowest quality algorithm. Each pixel in the output field is a direct copy of a pixel from the input field, with no filtering or interpolation between pixels. The algorithm repeats or drops input pixels according to the required scaling ratio. If out[i,j] is the pixel value at horizontal position i and vertical position j in the output field. in[x,y] is the pixel value at horizontal position x and vertical position y in the input field. The equation defines the pixel values in the output field:
Equation 2. Nearest Neighbor Equations

The nearest neighbor algorithm requires a line buffer with storage for a single line of the input field to allow for this line to repeat multiple times at the output during upscales. Because nearest neighbor scaling does not involve any interpolation or filtering, the resulting image can have a blocky appearance.


Bilinear scaling offers improved quality compared to nearest neighbor by interpolating between neighboring pixels to remove the blocky look of nearest neighbor scaling. Bilinear scaling is better suited to upscaling (increasing image size) than downscaling (reducing image size). The cut-off frequency of the basic bilinear filter is generally too high to remove all the aliasing artifacts that downscaling can introduce. However, even for upscales the results can look somewhat blurred, with the edges in the image softened.

The bilinear algorithm selects the same input pixel to create each output pixel as the nearest neighbor algorithm. It builds a 2x2 pixel window around the target input pixel, with the target pixel in the top left corner of the window. The floor function calculates the input pixel position (the values of x and y) to give integer indices, as pixels only exist at integer locations. But the integer indices have some error compared to the ideal location that preserves all the fractional position information. For example, if the ideal value before applying the floor function is 1.5, the integer value after applying the floor function is 1, and the error is 0.5. The bilinear algorithm uses the horizontal and vertical position error values, err h and err v respectively, to create coefficients that, when applied to the 2x2 pixel window created around the integer pixel location, produce a resulting pixel that is effectively located at the desired fractional position. The equations show how the values of the coefficients are created and applied to the input window of pixels to create the output pixel.

Equation 3. Bilinear Equations

To calculate the values of err h and err v with exact precision for all possible scaling ratios requires an infinite number of fractional bits in the hardware implementing the mathematics. You must specify via parameters how many fraction bits you want to include for the calculations in the horizontal and vertical directions, frac h and frac v respectively. The IP takes the value for frac h from the Horizontal coefficient fraction bits parameter, and the value for frac v from the Vertical coefficient fraction bits parameter. With the desired level of precision set, the equation shows the values of and

To create the 2x2 pixel window required for the bilinear filter, the bilinear algorithm requires a line buffer with storage for two lines of input video.


The polyphase algorithm requires the most resources, but it produces the highest quality results. It uses interpolation filters that are larger than the 2x2 tap filter used for bilinear scaling. Depending on the coefficients you select, these filters can provide improved frequency response, resulting in less blurring on the edges during upscales, and less aliasing artifacts during downscales. The increased size of the filters requires an increase in the number of input video lines that must be stored to create the vertical window, and increased DSP block (multiplier) usage to implement the filter mathematics.

The polyphase algorithm uses the same initial integer pixel position as nearest neighbor scaling and calculates positional error values in the same way as the bilinear algorithm. However, instead of using these error values directly to calculate the filter coefficients, the polyphase algorithm uses the error values as addresses into horizontal and vertical filter coefficients memories. Each address in the coefficient memory is referred to as a phase (for reasons that are explained in the coefficient selection section) and you define the number of horizontal and vertical phases, num_phase h and num_phase v respectively, via parameters. The Number of horizontal phases parameter sets the value for num_phase h and the Number of vertical phases parameter sets the value for num_phase v . The equation shows the horizontal phase, phase h , and the vertical phase, phase v , for each output pixel:

Equation 4. Phase Equations

You define the number of taps used in the horizontal and vertical scaling filters (num_taps h and num_taps v respectively). The number of taps can be any value between 4 and 64. A higher number of taps can allow for a more precise filter transfer function but comes at the cost of extra DSP block utilization and, in the case of the vertical filter, increased block memory utilization in the line buffer required to create the vertical sample window. Each vertical or horizontal phase in the coefficient memory contains one coefficient for each tap of the vertical or horizontal filter.

The scaler implements the vertical scaling function first (if selected), followed by the horizontal scaling function (if selected). The result of the vertical scaling is an intermediate image with the desired output height but retaining the original input width. If inter[x,j] as the pixel value in the intermediate image at horizontal position x and vertical position j, and coeff f v [n] as the vertical scaling filter coefficient for tap N (selected from phase v ), the equation shows how the IP calculates the intermediate image. The filter taps are indexed with 0 the ‘oldest’ data (closest to the top edge of the image) and num_taps v - 1 the newest data (closest to the bottom edge of the image).

Equation 5. Intermediate Image Equation

If coeff f h [n] is the horizontal scaling filter coefficient for tap N (selected from phase v ), the equation shows how the IP calculates the final output image from the intermediate image. The filter taps are again indexed with 0 the oldest data (closest to the left edge of the image) and num_taps h - 1 the newest data (closest to the right edge of the image)

Equation 6. Final Output Image Equation