# OpenCL Preview Extensions For VEBox

Published: 01/30/2017

Last Updated: 01/30/2017

# Fixed-Function Video Enhancement Pipeline (VEBox)

The VEBox is an independent fixed-function silicon block within Intel GPU hardware that provides a variety of image enhancement stages. This block is completely independent from the GPGPU pipeline and is thus able to execute concurrently with regular OpenCL kernels. Execution of the VEBox does not impact EU performance. See here for more information on preview extensions, how features are enabled and how to provide feedback.

## Why Use The VEBox?

The VEBox is not a programmable block, it is a fixed function unit that defines an image processing pipeline for algorithms that are common in video processing workloads. Each stage in the pipeline is implemented in gates to maximize workload performance with a low power footprint. The downside is that each stage of the pipeline is rigidly defined to perform a set algorithm, which may be configured in a number of ways, but cannot be fundamentally altered. If workloads can make use of the operations provided by VEBox there may be considerable performance advantages. Key among these is the ability to do computations without distorting either graphics or GPGPU performance. The VEBox shares memory resources with the rest of the HD Graphics part and the transition of OpenCL memory object between EU computation and VEBox computation is minimal.

## Limitations

The OpenCL extensions for VEBox define a hardware-only interface. This means that video enhancement features are only present when supported by physical hardware. The OpenCL API may be used to determine if there is VEBox present on the local machine and which features are supported.

The presence/capabilities of VEBox hardware depends on particular combinations of hardware and driver versions. In general, at the publication time of this article these features are available for processors with Gen9 GPUs (6th/7th Generation Core processors) in Linux.

## What Can VEBox Do?

Short answer, a lot. It has built in features for processing video (e.g. deinterlace), working with raw camera data ( e.g. demosaic) and a suite of common image processing operations (e.g. color space conversion, color correction, contrast enhancement, etc.). The VEBox pipeline on Skylake is illustrated below:

Whenever work is enqueued to the VEBox the entire pipeline is invoked. Invocation always happens with a set of input and output data (images) and an opaque pipeline state configuration. A typical usage is to invoke the pipeline with an input image, an output image, and an accelerator object (more about this later). The pipeline is broken down into roughly three sub-pipelines. They are the Camera pipeline, the Denoise and Deinterlace (DN/DI) pipeline and the Image Enhancement and Color Processing (IECP) pipeline. Execution of the VEBox will start from the top of one of these three stages. For example, if the programmer is working with RAW sensor data, they can invoke the pipeline starting with the Camera pipe. Similarly, if a programmer wants to convert interlaced video frames to progressive frames they can invoke the pipeline from the DN/DI stage. Generally speaking, any enabled stage down-stream from where execution begins will be applied in a single invocation. For example, the programmer may invoke VEBox with RAW camera data, demosaic (i.e. convert it to 4:4:4 color) then convert it into RGB in the Color Space Conversion stage (downstream in the IECP pipeline). The performance impact on having lots of stages enabled versus enabling just a few is typically negligible. Here is a brief description of each stage:

Stage Description
Black Level Correction Adjusts the black level
Vignette Correction

Reduces lens color distortion

White Balance Correction Applies white balance correction
Hot Pixel Correction Reduces salt-and-pepper noise and other artifacts
Denoise Adaptive noise reduction for improved quality
Deinterlace Converts from interlaced to progressive
Demosaic Converts Raw Bayer patterns to YUV color
Color Correction Matrix Applies color correction
Forward Gamma Correction Applies gamma correction
Front-End Color Space Conversion Converts the colors space to YUV for later processing
Skin-Tone Detection and Enhancement Improves the visual quality of skin-toned pixels
Gamut Compression Reduces color gamut in a way that minimizes distortion
Total Color Correction Modifies the colors based on key RGBYMC values
Process Amplifier Controls hue, saturation, brightness and contrast
Back-end Color Space Conversion Converts pixel in the pipeline to a desired format
Gamut Expansion / Color Correction Expands color to a wider gamut and other corrections

Most stages of the VEBox pipeline are expressed with minimal abstractions, allowing the user to make use of hardware without making any assumptions about the users workload.

In a few cases, certain stages may be configured by the driver (e.g. color space convert) to reduce the implementation complexity of common use cases. However, the programmer can override any of these default configurations by providing an explicit configuration.

# Extension API:

The OpenCL VEBox extensions expose low-level interfaces to the VEBox. There are three extension interfaces that roughly correspond to the three sub-pipelines described above. While these three interfaces culminate into a single pipeline, they are kept distinct based on a number of forward and backward compatibility considerations. These extension are written in a form consistent with other OpenCL extensions.

Enable with:

export OCL_EnablePreviewFeatures=1

Specification Description
cl_intelx_video_enhancement

Starting point to obtain VEBox built-in kernels, command queues and accelerator objects.

Samples: Minimal VEBox samples

cl_intelx_video_enhancement_color_pipeline

Extends cl_intelx_video_enhancement with the IECP pipeline and describes interface for accessing statistical information used in adaptive filters.

cl_intelx_video_enhancement_camera_pipeline

Extends cl_intelx_video_enhancement with the camera pipe, enabling workloads that operate on raw sensor data.