Visible to Intel only — GUID: cij1619516656586
Ixiasoft
Visible to Intel only — GUID: cij1619516656586
Ixiasoft
36.3. Warp IP Block Description
The video output process generates RGB or YUV format Intel FPGA streaming video using the warped image data.
The IP generates arbitrary warps based on a transform mesh. If you turn on Use easy warp, it offers a fixed set of transforms (rotations and mirroring) without the transform mesh. If you turn off Use easy warp, the IP processes the buffered input video by the configured number of warp engines to apply the required warp. Three coefficient tables control the warp engines to define the warp that the IP applies. External memory stores the three coefficient tables that are generated using the Warp IP software API.
The IP defines the required warp with a backward mapping from the output to the input pixel positions. It represents the warp as a subsampled mesh that defines the mapping in 8x8 regions. For output pixel mappings within the 8x8 positions, the warp engine applies bilinear interpolation.
If you turn off Use single memory bounce, the IP operates with two bounces through external memory. The IP buffers the input video and it writes back the resultant warped image to external memory into one of two output video buffers. The IP writes to these dual output buffers alternately. The IP reads the warped image in the output video buffers and passes out of the IP.
If you turn on Use single memory bounce, the IP operates with just a single bounce through external memory—the buffering of the input video. The IP transfers the resultant warped image directly to the video output process without passing through external memory.
The figure shows a high-level block diagram for the Warp IP with its connection to external memory when Use single memory bounce is off. In this configuration, the engines read and write video data through the external memory.
Coefficient Tables
Each engine within the Warp IP has read access to its own set of three coefficient tables that define and control the image transform that the IP applies. The three different tables are:
- Mesh coefficients that define the output to input pixel transform
- Fetch coefficients that control the loading of the input image into the cache memory within the engine(s).
- Filter coefficients that control the mapping from the cache memory as the IP generates the interpolated or filtered output pixels.
The format of the mesh coefficients is different to the mesh data that you provide to the software API. The Software API uses 32-bit signed integers for the mesh values; the Warp IP uses a 16-bit offset binary format.
The IP needs just the mesh data to define the warp. The software API uses this mesh data to generate the required coefficient tables.
Warp Mesh Interpolation
The IP defines the warp transform using an 8x8 subsampled mesh. This mesh defines the mapping from the output pixel positions to the corresponding input pixel positions. The 8x8 subsampled mesh requires that only the mappings for the following output pixel positions are defined:
(0,0), (8,0), (16,0) … (W, 0)
(0,8), (8,8), (16,8) … (W, 8)
.
(0,H), (8, H), (16, H) … (W, H)
where W=8*ceil(image width/8) and H=8*ceil(image height/8)
To generate the output pixel positions that lie in between these 8x8 positions, the Warp IP uses bilinear interpolation.
Output Pixel Interpolation and Filtering
The IP generates output pixels with the pixel data from the associated input pixel positions as defined by the warp that the IP applies. The IP generates output pixel values with a bicubic interpolation calculation using a 4x4 kernel of the associated input pixel values.
The weightings for the interpolation over the 4x4 kernel are a combination of a bicubic function and a variable low pass filtering function. The software API automatically applies low pass filtering, which it bases on the amount of downscaling that results for that particular region of the warp.
Blank Skip Regions
When you configure the Warp IP to substantially downscale regions of an image, large areas of the output image can map to points outside the input image. These unmapped regions result in the IP producing black.
Because these regions in the output image do not require any processing of the input image by the Warp IP, for efficiency the IP skips the processing associated with these regions. This skipping process is setup automatically by the software API which determines, from the desired warp mapping, which regions the IP skips. You see this behavior only when Use single memory bounce is off.
Easy Warp
When you turn on Use easy warp:
- The IP supports rotations of 0°, 90°, 180° and 270° and you can apply a horizontal mirror operation before the selected rotation.
- The IP does not configure any engines.
- The IP applies the required rotational or mirrored transform by the video output block as it reads data from external memory.
- The IP does not require any processing engines, giving resource savings and memory bandwidth savings.
For easy warp rotations of 0 or 180°, the maximum input width dimension must not exceed the maximum output width. You set the maximum width with the Maximum output video width parameter. For easy warp rotations of 90° or 270°, the transposing of vertical and horizontal dimensions places restrictions on the input resolution.
Maximum Output Video Width | Input Height Restriction |
---|---|
2048 | 1088 |
3840 | 2176 |
The figure shows a high-level block diagram for the Warp IP with Use Easy warp with its connection to external memory.
Single Memory Bounce
Generally, turning on Use single memory bounce gives reduced memory bandwidth compared to when it is off. However, single memory bounce may require larger internal RAM. The RAM usage depends on the specific warp transform. For some transforms the IP requires larger amounts of cache. Any increased cache requirement requires increased internal block RAM usage.
With Use single memory bounce on, three cache size options are available: 256, 512 and 1024 cache blocks per engine. With Use single memory bounce off, the IP has a fixed cache size of 256 cache blocks per engine.
Whether you turn on Use single memory bounce depends on the actual transform the IP performs. Intel provides a software tool that determines how much cache the IP needs to process the desired transform. You provide the tool with information such as the input and output resolutions, the transform required, and the number of engines to use. The tool then provides guidance on how much cache to use to process the transform.
For information on the Warp block cache tool, refer to Block Cache Tool. For information on the memory bandwidth implications of running with a single memory bounce, refer to External Memory for Warp IP.
Single Memory Bounce Cache Example
An example of cache usage is given based on a 45 degree rotation with UHD input and output resolutions at 60 fps on an Intel Arria 10 device. This throughput requires that the IP uses two engines.
With Use single memory bounce off, the block RAM usage by the IP is 365 M20Ks.
With Use single memory bounce on, the IP requires 512 cache blocks per engine because of the constraints of the 45 degree rotation. This cache block requirement translates to a block RAM usage of 407 M20Ks.
Use single memory bounce | Cache blocks per engine | Total block RAM usage |
---|---|---|
On | 512 | 407 |
Off | 256 (fixed) | 365 |