Video Conferencing features of Intel® Media Software Development Kit

Petter Larsson, Sravanthi Kota Venkata

Abstract

This article explains how to use the new and optimized video conferencing features available as part of Intel® Media SDK. Features common to video conferencing or streaming workloads are detailed together with source code references illustrating how a developer may use the feature in an application.

Introduction

The Intel® Media Software Development Kit (Intel® Media SDK) is a software development library that exposes the media acceleration capabilities of Intel platforms for video decoding, video encoding, and video pre/post processing. Intel Media SDK helps developers rapidly develop software that accesses hardware acceleration for video codecs with automatic fallback on software if hardware acceleration is not available.

This article explains how to use the new and optimized video conferencing features of Intel Media SDK. The features listed below addresses common video conferencing or streaming requirements for improved adaptation to transmission conditions, robustness and real-time responsiveness:

Low Latency Encode and Decode
Dynamic Bit Rate Control
Dynamic Resolution Control
Forced Key Frame Generation
Reference List Selection
Reference Picture Marking Repetition SEI message
Long Term Reference Frame (LTR)
Temporal Scalability
Motion JPEG (MJPEG) Decode

It is important to note that the majority of the above features were designed for the Media SDK AVC (H.264) codec. However, the MPEG2 encoder does support dynamic resolution control and dynamic bit rate control. Low latency MPEG2 encode or decode has not been optimized.

In the following chapters we will explain how a developer may use these new SDK features in an application. For further details on how to use the features, please refer to the Intel Media SDK manual and samples.

Intel® Media SDK Video Conferencing features

In this chapter we will detail features of the Intel® Media SDK that are important to a developer intending to build video conferencing or streaming workload types of applications. A real life video conferencing application includes components for reverse pipeline, network transfer, preview, side band channels, et cetera. These components are all out of scope of this article.

Note that many of the features described in this document are new added in API 1.3. To ensure access to the described features please make sure to initialize the SDK session specifying API 1.3 or more such as:

mfxVersion ver = {3, 1};
MFXInit(MFX_IMPL_AUTO, ver, session);

Intel® Media SDK provides sample_decode that shows how to use "low latency" mode.

Low Latency Encode and Decode

Low latency codecs improve real-time responsiveness. This is achieved by minimizing internal codec delay and buffering. To enable low latency mode using Intel® Media SDK the developer must configure the encoder and decoder with a specific set of parameters.

Encoder configuration

To enable the Intel® Media SDK encoder for optimal low latency the following set of parameters should be used:

mfxVideoParam::AsyncDepth = 1 mfxInfoMFX::GopRefDist = 1 mfxInfoMFX::NumRefFrame = 1

The AsyncDepth setting limits internal frame buffering. This also requires the application to synchronize after decoding or encoding or each frame. GopRefDist setting forces encoder to not use B-frames. NumRefFrame has the effect of only using previous P-frame as reference.

The encoder must also be configured to use the extended buffer type mfxExtCodingOption (MFX_EXTBUFF_CODING_OPTION), with specific setting for decoder frame/picture buffering (DPB), to ensure that decoded frame gets displayed immediately after decoding:

mfxExtCodingOption::MaxDecFrameBuffering = 1

Decoder configuration

To enable the Intel® Media SDK decoder for low latency the following set of parameters should be used:

mfxVideoParam::AsyncDepth = 1

The AsyncDepth setting limits internal frame buffering. This also requires the application to synchronize after decoding or encoding or each frame.

The decoder bit stream DataFlag should also be set to indicate that a full frame is in buffer. Note that if full frame is not in decoder bit stream buffer, the decoded frame will have artifacts.

mfxBitStream::DataFlag = MFX_BITSTREAM_COMPLETE_FRAME

It is also suggested that the decoder bit stream buffer is only provided one frame at a time.

Dynamic Bit Rate Control

To be able to adapt to varying network transmission conditions it is important that an encoder has the capability to adjust bit rate at any time during an encoding session.

An application can change bit rate using the TargetKbps and/or MaxKbps parameter by calling the MFXVideoENCODE_Reset function at any time during encode operation.

If Hypothetical Reference Decoder (HRD) compliance is required then mfxExtCodingOption::NalHrdConformance should be set (MFX_CODINGOPTION_ON). In that case bit rate change is only allowed in Variable Bit Rate (VBR) mode and the encoder will also generate a key frame every time the bit rate is changed.

In case HRD compliance is not required, bit rate can also be changed in Constant Bit Rate (CBR) and Average Variable Bit Rate (AVBR) mode, by setting mfxExtCodingOption::NalHrdConformance to off, MFX_CODINGOPTION_OFF (this is also the default setting). This mode also eliminates key frame generation every time the bit rate is changed. However, if key frame generation is required please follow the method described in the key frame generation section.

Alternatively, the application may use the Constant Quantization Parameter (CQP) encoding mode to perform customized bit rate adjustment on a per-frame basis. For more information please refer to the Intel Media SDK video conferencing sample.

MPEG2 encoder usage note: Dynamic bit rate change will always result in generation of key frame.

Dynamic Resolution Control

The Intel Media SDK encoder supports dynamic resolution change in all bit rate control modes. The application may change resolution by calling MFXVideoENCODE_Reset function.

Note that the application cannot increase resolution beyond the size specified during encoder initialization.

The encoder does not guarantee HRD conformance on resolution change and always results in insertion of key frame.

Forced Key Frame Generation

The ability to insert key frames at any time during encoding enables greater control over stream quality robustness and error correction.

Encoder frame type control depends on the selected encoder order mode:

Display Order: The application can enforce any current frame to be key frames, but cannot change the frame type of already buffered frames inside the encoder
Encoded Order: The application must exactly specify the frame type for every frame thus the application can enforce the current frame to be any frame type that standards allow

To control the encoded frame type the application can set the FrameType parameter of the mfxEncodeCtrl structure. mfxEncodeCtrl structure reference is used as the first parameter of the MFXVideoENCODE_EncodeFrameAsync call and allows the developer additional control over the encoding operation. Key frame generation control is illustrated in below example:

mfxEncodeCtrl EncodeCtrl; memset(&EncodeCtrl, 0, sizeof(mfxEncodeCtrl)); EncodeCtrl.FrameType = MFX_FRAMETYPE_I | MFX_FRAMETYPE_REF | MFX_FRAMETYPE_IDR; MFXVideoENCODE_EncodeFrameAsync(&EncodeCtrl, …);

Reference List Selection

The Reference List Selection feature is useful if the encoder application can obtain feedback about client side frame reception conditions. Based upon this information the application may want to adjust the encoder to use or not use certain frames as reference to improve robustness and error resilience.

Figure 1 - Reference frame feedback

The application can specify the reference window size by specifying the parameter mfxInfoMFX::NumRefFrame during encoding initialization. Depending on platform, there is a limitation on how big the size of the reference window can be. To determine the actual parameter set after initialization, use the function MFXVideoENCODE_GetVideoParam to retrieve the current working set of parameters (including actual NumRefFrame used). Also note that the size of the reference window also depends on the selected codec profile/level and resolution.

During encoding, the application can specify the actual reference window sizes by attaching the mfxExtAVCRefListCtrl (MFX_EXTBUFF_AVC_REFLIST_CTRL) structure to the MFXVideoENCODE_EncodeFrameAsync function. Note that mfxExtAVCRefListCtrl is used as extended buffer in the mfxEncodeCtrl structure. The NumRefIdxL0Active parameter of the mfxExtAVCRefListCtrl structure specifies the size of the reference list L0 (for B and P frame prediction according to AVC standard) and the NumRefIdxL1Active parameter specifies the size of the reference list L1 (for B frame prediction according to AVC standard). These two values, specifies the actual size of the reference lists, and must be less or equal to the parameter mfxInfoMFX::NumRefFrame that was set during encoding initialization.

Using the same extended buffer, the application can also instruct the encoder to use or not use certain reference frames. The application specifies the preferred reference frame list PreferredRefList and/or the rejected frame list RejectedRefList in the mfxExtAVCRefListCtrl structure. The two lists control how the encoder chooses the reference frames of the current frame.

There are a few limitations:

Application must uniquely identify each input frame, by setting the mfxFrameData::FrameOrder parameter.
The frames in the lists are ignored if they are out of the reference window.
If by going through the lists, the SDK encoder cannot find a reference frame for the current frame, the SDK encoder will encode the current frame using Intra prediction only.
If the GOP pattern contains B-frames, the SDK encoder will not be able to follow the mfxExtAVCRefListCtrl instructions (the instructions will be ignored).
Reference list control is only supported in progressive encoding mode.

Make sure to set FrameOrder = MFX_FRAMEORDER_UNKNOWN to mark unused reference list items.

For instance, to indicate to the encoder, that is about to encode frame 100, that frame 98 and 99 was received as corrupted frames on the decoder client side, the reference list can be specified as follows (assumes proper initialization of unused frames):

RejectedRefList[0].FrameOrder = 98; RejectedRefList[0].PicStruct = MFX_PICSTRUCT_PROGRESSIVE; RejectedRefList[1].FrameOrder = 99; RejectedRefList[1].PicStruct = MFX_PICSTRUCT_PROGRESSIVE;

Similar code applies to setting PreferredRefList, resulting in reordering the reference list for the currently encoded frame.

Reference Picture Marking Repetition SEI Message

As with reference list selection, improved robustness and error resilience can be achieved by using the Reference Picture Marking Repetition Supplemental Enhancement Information (SEI) message feature, as defined by the AVC standard (D.1.8).

The message is used to repeat the decoded reference picture marking syntax structures in the earlier decoded pictures. Consequently, even earlier reference pictures were lost, the decoder can still maintain correct status of the reference picture buffer and reference picture lists.

The application can request writing the Reference Picture Marking Repetition SEI message during encoding initialization, by setting the RefPicMarkRep flag to MFX_CODINGOPTION_ON in the mfxExtCodingOption (MFX_EXTBUFF_CODING_OPTION) extended buffer.

The decoder will respond to the reference picture marking repetition SEI message if such message exists in the bitstream, and check with the reference list information specified in the sequence/picture headers. The decoder will report any mismatch of the SEI message with the reference list information via the mfxFrameData::Corrupted field.

Long Term Reference Frame

An application may use a Long-Term Reference (LTR) frame to improve coding efficiency. For instance, LTR may be useful if a certain pattern is continuously part of frame background over long period of time. Or to store a representation of a camera view when switching to another camera, then enabling better prediction when switching back to prior camera view. Assigning an LTR allows the encoder to tell the decoder to hold onto a frame longer than it would as a short-term reference frame.

Unlike a short-term reference frame (controlled by the encoder), an LTR frame is controlled entirely by the application. The encoder itself never marks or unmarks frame as an LTR.

Each frame has a unique number FrameOrder in mfxFrameData structure and the application uses this to identify frame during the marking process.

The application uses the mfxExtAVCRefListCtrl buffer to mark frame as LTR and later to unmark it. To mark a frame as LTR put its number (FrameOrder) in mfxExtAVCRefListCtrl::LongTermRefList list. After marking as LTR, the encoder will use this LTR frame as reference for all consecutive frames until the frame is unmarked. To unmark a frame put its number in mfxExtAVCRefListCtrl::RejectedRefList list. LTR will also be automatically unmarked by IDR frame.

Note that a frame can only be marked as LTR if it is present inside the encoder frame buffer.

The encoder puts all long-term reference frames at the end of a reference frame list. If the number of active reference frames (the NumRefIdxL0Active and NumRefIdxL1Active values in the mfxExtAVCRefListCtrl extended buffer) is smaller than the total reference frame number (the NumRefFrame value in the mfxInfoMFX structure during the encoding initialization), the SDK encoder may ignore some or all long term reference frames. The application may avoid this by providing list of preferred reference frames in the PreferredRefList list in the mfxExtAVCRefListCtrl extended buffer. In this case, the SDK encoder reorders the reference list based on the specified list.

For instance, to set frame 100 as an LTR frame, initialize the reference list as follows (assumes proper initialization of unused frames):

LongTermRefList[0].FrameOrder = 100; LongTermRefList[0].PicStruct = MFX_PICSTRUCT_PROGRESSIVE;

Temporal Scalability

Temporal scalability is stream scalability in terms of frame rate, meaning that a given bit stream has the ability to have multiple frame rates.

For instance, a stream may have a base layer frame rate of 7.5 fps. Additional temporal layers may have frame rate of 15, 30 and 60 fps allowing improved error resiliency in decoder in case of packet loss/frame corruption by lowering the frame rate while maintaining the quality (note that some specifications instead define the greatest rate layer to be the base layer, such as 60 fps in this example).

Temporal scalability is achieved by encoding stream in such a way that frames can be skipped during decoding since they do not have other frames depending on them, thus adjusting decoded frame rate. This is illustrated in a simplified temporal stream example below where max frame rate is 60 fps.

Figure 2 - Temporal scalability - Frame dependencies

In the above figure, consider green frames (1, 3, 5 etc.), no frame is dependent on them, therefore they could be skipped and all remaining frames could be decoded, thus cutting the frame rate by a factor of 2 from 60fps to 30fps. Since the green frames are skipped you could also skip the blue frames since no frame dependency remains, thus cutting the frame rate in half again, resulting in 15fps. In the same way decoder can skip black frames, resulting in 7.5fps.

It’s important to understand that the Media SDK decoder does not support layers selection. Application must interpret (encoded bit stream header) and decide what temporal layers to decode.

Usage

The application may specify temporal hierarchy of frames by using the mfxExtAvcTemporalLayers (MFX_EXTBUFF_AVC_TEMPORAL_LAYERS) extended buffer. This functionality is limited to display order mode.

To distinguish different temporal layers, the encoder inserts prefix Network Abstraction Layer (NAL) unit before each slice with unique temporal and priority IDs. The encoder starts temporal IDs from zero and priority IDs from BaseLayerPID increasing both of them by one for each consecutive layer.

If the application additionally needs to specify unique sequence or picture parameter sets IDs it should use mfxExtCodingOptionSPSPPS (MFX_EXTBUFF_CODING_OPTION_SPSPPS) extended buffer, set all pointers and sizes to zero, and use only SPSId/PPSId fields. The same Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) ID will be used for all temporal layers.

Each temporal layer is defined by a Scale parameter. This is the ratio of frame rates between the base layer and the temporal layer. The application may skip some of the temporal layer(s) by setting the Scale parameter equal to zero. In this case temporal layers with corresponding temporal IDs will be absent from the stream. Also, the application must use an integer ratio of the frame rates for two consecutive temporal layers.

Two consecutive temporal layers must have integer ratio of frame rates. For instance, let’s say that we have two layers 60fps and 30fps. An additional layer could not be set to 20 fps since the ratio between 30/20 is not an integer number. However an additional layer of 15 fps is accepted since 30/15 is an integer ratio.

For instance, to enable encode of a 60 fps stream (FrameRateExtN/FrameRateExtD specifies frame rate of highest layer) with a base temporal stream of 15fps with temporal layers 30 and 60 fps the mfxExtAvcTemporalLayers extended buffer would be configured as follows:

mfxExtAvcTemporalLayers TemporalLayers; memset(&TemporalLayers, 0, sizeof(mfxExtAvcTemporalLayers)); TemporalLayers.BaseLayerPID = 0; // Index of base layer TemporalLayers.Layer[0].Scale = 1; // base layer, 15fps/15fps = 1 TemporalLayers.Layer[1].Scale = 2; // first layer, 30fps/15fps = 2 TemporalLayers.Layer[2].Scale = 4; // second layer, 60fps/15fps = 4 TemporalLayers.Layer[3].Scale = 0; // No layer

MJPEG Decode

The Intel Media SDK MJPEG decoder is enabled by using the MFX_CODEC_JPEG codec identifier and uses the same set of API function calls as the other SDK decoders.

A key difference compared to other SDK decoders is that the MJPEG decoder can also deliver decoded video frames in the RGB4 color format (besides the common NV12 format). The decoder also supports frame rotation in steps of 90 degrees.

For more details regarding the MJPEG decoder please refer to the Intel Media SDK MJPEG manual, the sample_decode. As a side note, the SDK MJPEG decoder can also effectively be used as a single JPEG image decoder.

Conclusion

In this article we presented the Intel® Media SDK video conferencing features: Low Latency Encode and Decode, Dynamic Bit Rate and Resolution Control, Forced Key Frame Generation, Reference List Selection, Reference Picture Marking Repetition SEI Message, Long Term Reference, Temporal Scalability and the new MJPEG decoder component.

By utilizing these new features developers can build flexible video conferencing applications using Intel Media SDK taking advantage of Intel platforms hardware acceleration capabilities.

For further details on how to use the features please refer to the Intel Media SDK samples included in the SDK install package.

For developer questions on how to use Intel Media SDK please refer to the Intel® Media SDK forum on the Intel Developer Zone site: http://software.intel.com/en-us/forums/intel-media-sdk/

Terminology

Term	Description
DPB	Decode Picture Buffer
LTR	Long Term Reference (frame)
API	Application Programming Interface
DXVA	DirectX Video Acceleration
DDI	Device Driver Interface
SEI	Supplemental Enhancement Information
QSV	Intel® Quick Sync Video Technology
CQP	Constant Quantization Parameter
HRD	Hypothetical Reference Decoder
NAL	Network Abstraction Layer
SPS	Sequence Parameter Set
PPS	Picture Parameter Set
VBR	Variable Bit Rate
AVBR	Average Variable Bit Rate
CBR	Constant Bit Rate
MJPEG	Motion JPEG (ITU T.81 standard)
AVC	Advanced Vide Coding (ITU-T H.264 standard)
RGB4	RGB (Red, Green, Blue) pixel color format. A 32 bit format also known as RGB32
NV12	Common hybrid planar YUV color format

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in