Intel® Media SDK is a framework for developing media applications, by providing APIs for ease of development and optimizing them for underlying hardware for best performance. The SDK provides optimized versions for basic, building-block algorithms of media domain; and articles such as these will provide insights into development of use-case scenarios using the SDK.
In this article, we are going to see how to achieve Video Composition using the SDK. Video Composition is a very useful feature for many video applications such as video wall, advertisements and any application that wants to smartly display multiple streams.
For Linux platforms, Media SDK provides hardware implementation of the composition feature,(as of SDK version 1.9, this feature is not available on Windows OS), which can composite up-to 16 different video streams. The SDK provides structures that control the specifics of composition, such as the resolution, size, source co-ordinates on the input stream, placement co-ordinates on the destination surface, for each individual stream. We will explain each of these below using the example below:
Example use-case: Compositing two video streams, one of resolution 352x288 and other of resolution 176x144. The destination surface is of resolution 352x288. We want to show the smaller resolution stream as an inset on the first input stream.
Step 1: Input Parameters: Parameter file (.par) with per-input stream information
We need to specify each of the input stream that is being composited, as well as its parameters such as resolution, crop dimensions, placement of the destination image, and other secondary parameters such as alpha factor. It can be tedious to specify these parameters individually for each stream. Thus, Media SDK allows specifying these in a parameter file and passing that file as input to the application. For our use-case defined above, here is the parameter file:
stream=/path/to/stream/in_352_288.yuv
width=352
height=288
cropx=0
cropy=0
cropw=352
croph=288
dstx=0
dsty=0
dstw=352
dsth=288
fourcc=nv12
stream=/path/to/stream/in_176_144.yuv
width=176
height=144
cropx=0
cropy=0
cropw=88
croph=72
dstx=0
dsty=0
dstw=88
dsth=72
fourcc=yuv
Step 2: Set-up the video parameters in the application
In this step, we populate the video parameters for both input and output. For input VPP parameters, the details such as resolution and crop dimensions should correspond to the largest input stream. Otherwise, the process of filling these parameters is similar to the explanation in this section of the article.
/******
Initialize VPP parameters
For simplicity, we have filled these parameters for the streams used here.
The developer is encouraged to generalize the mfxVideoParams filling using either command-line options or par file usage
******/
mfxVideoParam VPPParams;
memset(&VPPParams, 0, sizeof(VPPParams));
// Input data
VPPParams.vpp.In.FourCC = MFX_FOURCC_NV12;
VPPParams.vpp.In.ChromaFormat = MFX_CHROMAFORMAT_YUV420;
VPPParams.vpp.In.CropX = 0;
VPPParams.vpp.In.CropY = 0;
VPPParams.vpp.In.CropW = inputWidth;
VPPParams.vpp.In.CropH = inputHeight;
VPPParams.vpp.In.PicStruct = MFX_PICSTRUCT_PROGRESSIVE;
VPPParams.vpp.In.FrameRateExtN = 30;
VPPParams.vpp.In.FrameRateExtD = 1;
// width must be a multiple of 16
// height must be a multiple of 16 in case of frame picture and a multiple of 32 in case of field picture
VPPParams.vpp.In.Width = MSDK_ALIGN16(inputWidth);
VPPParams.vpp.In.Height =
(MFX_PICSTRUCT_PROGRESSIVE == VPPParams.vpp.In.PicStruct) ?
MSDK_ALIGN16(inputHeight) :
MSDK_ALIGN32(inputHeight);
// Output data
VPPParams.vpp.Out.FourCC = MFX_FOURCC_NV12;
VPPParams.vpp.Out.ChromaFormat = MFX_CHROMAFORMAT_YUV420;
VPPParams.vpp.Out.CropX = 0;
VPPParams.vpp.Out.CropY = 0;
VPPParams.vpp.Out.CropW = inputWidth;
VPPParams.vpp.Out.CropH = inputHeight;
VPPParams.vpp.Out.PicStruct = MFX_PICSTRUCT_PROGRESSIVE;
VPPParams.vpp.Out.FrameRateExtN = 30;
VPPParams.vpp.Out.FrameRateExtD = 1;
// width must be a multiple of 16
// height must be a multiple of 16 in case of frame picture and a multiple of 32 in case of field picture
VPPParams.vpp.Out.Width = MSDK_ALIGN16(VPPParams.vpp.Out.CropW);
VPPParams.vpp.Out.Height =
(MFX_PICSTRUCT_PROGRESSIVE == VPPParams.vpp.Out.PicStruct) ?
MSDK_ALIGN16(VPPParams.vpp.Out.CropH) :
MSDK_ALIGN32(VPPParams.vpp.Out.CropH);
// Video memory surfaces are used to storing the raw frames. Use with HW acceleration for better performance
VPPParams.IOPattern = MFX_IOPATTERN_IN_VIDEO_MEMORY | MFX_IOPATTERN_OUT_VIDEO_MEMORY;
Step 3: Populate per-stream mfxFrameInfo with details from parameter file
In this step, we will specify the parameters specific to each input file such as resolution and crop dimensions. During the VPP processing loop when each stream is loaded, the surface info parameters are set to these parameters.
/*************************************************************************************************
COMPOSITION-SPECIFIC BEGINS: Setting Phase
How we are compositing?
Let us crop the second stream to W/2,H/2 size starting at (0,0) co-ordinate. This cropped stream will be composited onto the first stream, which will be used at its original resolution. You can also choose where you would like the cropped second stream to go on the output surface - let's say we want it at (0,0) co-ordinates
NOTE: For clean implementation, we recommend these values be read in from the parameter file.
*************************************************************************************************/
mfxU16 W1 = 352, H1 = 288;
mfxU16 Cx1 = 0, Cy1 = 0, Cw1 = W1, Ch1 = H1;
mfxU16 W2 = 176, H2 = 144;
mfxU16 Cx2 = 0, Cy2 = 0, Cw2 = W2 >> 1, Ch2 = H2 >> 1;
/** Fill frame params in mFrameInfo structures with the above parameters **/
for (mfxU16 i = 0; i < NUM_STREAMS; i++){
memcpy(&inputStreams[i],&(VPPParams.vpp.In), sizeof(mfxFrameInfo));
inputStreams[i].Width = i == 0 ? W1 : W2;
inputStreams[i].Height = i == 0 ? H1 : H2;
inputStreams[i].CropX = i == 0 ? Cx1 : Cx2;
inputStreams[i].CropY = i == 0 ? Cy1 : Cy2;
inputStreams[i].CropW = i == 0 ? Cw1 : Cw2;
inputStreams[i].CropH = i == 0 ? Ch1 : Ch2;
}
Step 4: Initialize extended buffer for Composition
As explained in this section, to perform auxiliary functions such as denoising, stabilization or composition, extended buffers are used by VPP. These extended buffers are passed to the VPP parameters (as shown in Step 5).
// Initialize extended buffer for Composition
mfxExtVPPComposite composite;
memset(&composite, 0, sizeof(composite));
composite.Header.BufferId = MFX_EXTBUFF_VPP_COMPOSITE;
composite.Header.BufferSz = sizeof(mfxExtVPPComposite);
composite.NumInputStream = 2;
composite.Y = 10;
composite.U = 80;
composite.V = 80;
composite.InputStream = (mfxVPPCompInputStream*) new mfxVPPCompInputStream * [2];
composite.InputStream[0].DstX = (mfxU32)0;
composite.InputStream[0].DstY = (mfxU32)0;
composite.InputStream[0].DstW = (mfxU32)W1;
composite.InputStream[0].DstH = (mfxU32)H1;
composite.InputStream[1].DstX = (mfxU32)0; //Co-ordinates for where the second stream should go on the output surface
composite.InputStream[1].DstY = (mfxU32)0;
composite.InputStream[1].DstW = (mfxU32)Cw2;
composite.InputStream[1].DstH = (mfxU32)Ch2;
mfxExtBuffer* ExtBuffer[1];
ExtBuffer[0] = (mfxExtBuffer*) &composite;
VPPParams.NumExtParam = 1;
VPPParams.ExtParam = (mfxExtBuffer**) &ExtBuffer[0];
Step 5: VPP Processing loop - Read & process each input stream
// Stage 1: Main processing loop
//
while (MFX_ERR_NONE <= sts || MFX_ERR_MORE_DATA == sts) {
nSurfIdxIn = GetFreeSurfaceIndex(pVPPSurfacesIn, nVPPSurfNumIn); // Find free input frame surface
MSDK_CHECK_ERROR(MFX_ERR_NOT_FOUND, nSurfIdxIn, MFX_ERR_MEMORY_ALLOC);
// Surface locking required when read/write video surfaces
sts = mfxAllocator.Lock(mfxAllocator.pthis, pVPPSurfacesIn[nSurfIdxIn]->Data.MemId, &(pVPPSurfacesIn[nSurfIdxIn]->Data));
MSDK_BREAK_ON_ERROR(sts);
/******************************************************************************************************************
COMPOSITION-SPECIFIC CODE BEGINS:
Loading data from each of the input streams, and
Setting the surface parameters to the Crop, Width, Height, values of the input stream being loaded
******************************************************************************************************************/
streamNum %= NUM_STREAMS;
memcpy(&(pVPPSurfacesIn[nSurfIdxIn]->Info), &(inputStreams[streamNum]), sizeof(mfxFrameInfo));
sts = LoadRawFrame_YV12toNV12(pVPPSurfacesIn[nSurfIdxIn], fSource[streamNum], &inputStreams[streamNum]); // Load frame from file into surface
streamNum++;
MSDK_BREAK_ON_ERROR(sts);
/******************************************************************************************************************
COMPOSITION-SPECIFIC CODE ENDS:
******************************************************************************************************************/
sts = mfxAllocator.Unlock(mfxAllocator.pthis, pVPPSurfacesIn[nSurfIdxIn]->Data.MemId, &(pVPPSurfacesIn[nSurfIdxIn]->Data));
MSDK_BREAK_ON_ERROR(sts);
nSurfIdxOut = GetFreeSurfaceIndex(pVPPSurfacesOut, nVPPSurfNumOut); // Find free output frame surface
MSDK_CHECK_ERROR(MFX_ERR_NOT_FOUND, nSurfIdxOut, MFX_ERR_MEMORY_ALLOC);
for (;;) {
// Process a frame asychronously (returns immediately)
sts = mfxVPP.RunFrameVPPAsync(pVPPSurfacesIn[nSurfIdxIn], pVPPSurfacesOut[nSurfIdxOut], NULL, &syncp);
if (MFX_WRN_DEVICE_BUSY == sts) {
MSDK_SLEEP(1); // Wait if device is busy, then repeat the same call
} else
break;
}
if (MFX_ERR_MORE_DATA == sts) // Fetch more input surfaces for VPP
continue;
// MFX_ERR_MORE_SURFACE means output is ready but need more surface (example: Frame Rate Conversion 30->60)
// * Not handled in this example!
MSDK_BREAK_ON_ERROR(sts);
sts = session.SyncOperation(syncp, 60000); // Synchronize. Wait until frame processing is ready
MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);
++nFrame;
// Surface locking required when read/write video surfaces
sts = mfxAllocator.Lock(mfxAllocator.pthis, pVPPSurfacesOut[nSurfIdxOut]->Data.MemId, &(pVPPSurfacesOut[nSurfIdxOut]->Data));
MSDK_BREAK_ON_ERROR(sts);
sts = WriteRawFrame(pVPPSurfacesOut[nSurfIdxOut], fSink);
MSDK_BREAK_ON_ERROR(sts);
sts = mfxAllocator.Unlock(mfxAllocator.pthis, pVPPSurfacesOut[nSurfIdxOut]->Data.MemId, &(pVPPSurfacesOut[nSurfIdxOut]->Data));
MSDK_BREAK_ON_ERROR(sts);
printf("Frame number: %d\n", nFrame);
fflush(stdout);
}
With this, we have come to the conclusion of Composition using Media SDK. For the above example, shown below is the output composited stream thumbnail. You can easily extend the example shown above to process multiple streams. You can find documentation on Composition in the mediasdk-man.pdf document.