Leveraging GPU in Cloud Native Video Streaming

ID 736352
Updated 6/23/2022
Version Latest
Public

author-image

By

Cloud native technologies empower organizations to build and run applications that take advantage of the distributed computing offered by cloud infrastructures, and exploit the scale, elasticity, resiliency, and flexibility that the clouds provide. The Cloud Native Computing Foundation (CNCF) [1] was founded to bring together developers, end users, and vendors to seed technologies that foster and sustain an ecosystem of open source projects to make cloud native computing ubiquitous. Kubernetes [2], originally developed by Google, is the de facto open-source container orchestration platform hosted by CNCF to automate the deployment, scaling, and management of containerized applications. It also provides a device plugin framework that allows vendors to advertise system hardware resources through device plugins, and makes allocations to workloads (e.g. video streaming) to take advantage of the hardware resources.

The global market size of video streaming is projected to grow from $419 billion in 2021 to $932 billion by 2028. The demand is continuously rising in several channels. Schools and universities record webinars and courses to enhance the learning processes. Businesses conduct professional conferences, staff training, and promote their products and services that enhance their brands and customer engagement activities. In addition, remote working and lockdown during the CDVID-19 pandemic have fueled the market growth even further.

This article illustrates a cloud native-based video streaming example that offloads the transcoding work to the hardware GPU accelerators using the Intel GPU device plugin for Kubernetes [3].

Introduction

Figure 1: Cloud-based video streaming example of software architecture

 

The cloud-based video streaming example consists of two nodes running software components as shown in the previous diagram. The video transcoder node on the left leverages Intel GPU hardware codecs to transcode and stream the videos to the video server node on the right. With the Intel GPU device plugin for Kubernetes, the transcoding workload is deployed only on the server with Intel GPU hardware. In addition to the Intel GPU, Intel device plugins for Kubernetes enable cloud-native workloads to utilize hardware features, such as to protect application’s code and data from modification using hardened enclaves with Intel® Software Guard Extensions (Intel® SGX) technology, or to accelerate compute-intensive operations with Intel® QuickAssist Technology (Intel® QAT), etc.

On the video server node, we are streaming videos though the Real-Time Messaging Protocol (RTMP) module of a NGINX [4] web server orchestrated by Kubernetes. RTMP, which was initially developed by Macromedia for streaming audio and video over the Internet, provides a low-latency delivery method for streaming high-quality videos, but it’s not natively supported by modern devices (e.g. smartphones, tablets, and PCs) and browsers. On the other hand, the HTTP protocol based on HLS (HTTP Live Streaming) and DASH (Dynamic Adaptive Streaming over HTTP), is compatible across any device and operating system. The HTTP protocol supports adaptive bitrate streaming that helps to deliver optimized video quality, according to the stability and the connection speed. The following Kubernetes configuration files deploy a workload running the NGINX RTMP module for streaming live videos. RTMP is used for ingesting video feeds from the video encoder / transcoder workloads. It then delivers HLS and DASH fragments to online video players via HTTP protocol. However, instead of building the NGINX-RTMP server from source, we are running the pre-integrated media streaming and web hosting software stack of an Open Visual Cloud [5] container image. Please reference the Intel Developer Catalog [6] for more detailed information on Open Visual Cloud software stacks.

kind: ConfigMap
apiVersion: v1
metadata:
  name: nginx-conf
  namespace: demo
data:
  nginx.conf: |
    worker_processes auto;
    daemon off;
    error_log /var/www/log/error.log info;

    pid /var/run/nginx.pid;

    events {
        worker_connections  1024;
    }

    rtmp {
        server {
            listen 1935;
            chunk_size 4000;

            application live {
                live on;
                interleave on;
 
                hls on;
                hls_path /var/www/hls;
                hls_nested on;
                hls_fragment 3;
                hls_playlist_length 60;
                hls_variant _hi BANDWIDTH=6000k;
                hls_variant _lo BANDWIDTH=3000k;

                dash on;
                dash_path /var/www/dash;
                dash_fragment 3;
                dash_playlist_length 60;
                dash_nested on;
            }
        }
    }

    http {
        include /etc/nginx/mime.types;
        default_type application/octet-stream;

        sendfile on;
        tcp_nopush on;
        keepalive_timeout 65;
        gzip on;

        server {
            listen 80;

            location / {
                root /var/www/html;
                index index.html index.htm;
                add_header 'Access-Control-Allow-Origin' '*' always;
                add_header 'Access-Control-Expose-Headers' 'Content-Length';
            }


            location /hls {
                alias /var/www/hls;
                add_header Cache-Control no-cache;
                add_header 'Access-Control-Allow-Origin' '*' always;
                add_header 'Access-Control-Expose-Headers' 'Content-Length';
                types {
                    application/vnd.apple.mpegurl m3u8;
                    video/mp2t ts;
                }
            }

            location /dash {
                alias /var/www/dash;
                add_header Cache-Control no-cache;
                add_header 'Access-Control-Allow-Origin' '*' always;
                add_header 'Access-Control-Expose-Headers' 'Content-Length';
                types {
                    application/dash+xml mpd;
                }
            }
        }
    }

File: nginx-conf.yaml
 

kind: Deployment
apiVersion: apps/v1
metadata:
  name: streaming-server
  namespace: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-rtmp
  template:
    metadata:
      labels:
        app: nginx-rtmp
    spec:
      containers:
      - name: nginx-rtmp
        image: openvisualcloud/xeone3-ubuntu1804-media-nginx
        volumeMounts:
        - name: nginx-conf
          mountPath: "/usr/local/etc/nginx"
          readOnly: true
        ports:
        - name: http
          containerPort: 80
        - name: rtmp
          containerPort: 1935
        command: [ "/usr/local/sbin/nginx", "-c", "/usr/local/etc/nginx/nginx.conf" ]
      volumes:
        - name: nginx-conf
          configMap:
            name: nginx-conf

---
kind: Service
apiVersion: v1
metadata:
  name: svc-nginx-rtmp
  namespace: demo
spec:
  selector:
    app: nginx-rtmp
  ports:
    - name: http
      protocol: TCP
      port: 80
      targetPort: 80
    - name: rtmp
      protocol: TCP
      port: 1935
      targetPort: 1935
  externalIPs:
    - <External IP address for receiving RTMP streams>

File: streaming-server.yaml


It's a good practice to isolate Kubernetes objects in namespaces inside a cluster. A namespace demo is used to collect objects designed for live streaming described in this tutorial. To start waiting streams from video encoders, create the YAML files shown above and deploy the NGINX-RTMP server with the following commands:

$ kubectl create namespace demo
$ kubectl apply -f nginx-conf.yaml
$ kubectl apply -f streaming-server.yaml


With modern display technologies that provide the best user experiences, more and more video clips are produced in ultra-high resolutions. To stream ultra-high resolution videos over the Internet, the streaming servers or services usually transcode the videos to multiple lower resolution videos, so that the service can leverage the adaptive bitrate streaming protocols, such as HLS or DASH, to deliver to clients depending on the bandwidth of the connections. Most of the time, transcoding ultra-high resolution videos can only be achieved using hardware GPU codecs. Therefore, on the video transcoder node, we deploy the video transcoding workloads to Kubernetes nodes with GPU hardware codecs through the Intel GPU device plugin for Kubernetes.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc-cdn-bundle
  namespace: demo
spec:
  storageClassName: sc-cdn
  accessModes:
    - ReadOnlyMany
  resources:
    requests:
      storage: 1Gi

---
kind: Job
apiVersion: batch/v1
metadata:
  name: transcode
  namespace: demo
spec:
  backoffLimit: 0
  template:
    spec:
      restartPolicy: Never
      volumes:
        - name: cdn
          persistentVolumeClaim:
            claimName: pvc-cdn-bundle
      containers:
        - name: ffmpeg
          image: sysstacks/mers-ubuntu
          volumeMounts:
            - mountPath: /var/www/cdn
              name: cdn
          resources:
            limits:
              gpu.intel.com/i915: 1
          command: [ "/var/www/cdn/bin/start.sh" ]

File: transcoding.yaml


For example, the above YAML file requests one Intel GPU for the video transcoding job with the following configuration. Kubernetes then deploys the job only to the worker node with GPU capability in the cluster:

       ...
       resources:
          limits:
            gpu.intel.com/i915: 1
       ...


In this transcoding job example, the pre-integrated container image from the Media Reference Stack [7] is used to leverage the hardware codecs on Intel platforms. Since the job is scheduled on an Intel GPU-capable node to run an ephemeral container, the video source files are provided separately by a NFS server outside of the cluster through a Kubernetes Persistent Volume:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: sc-cdn
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

---
kind: PersistentVolume
apiVersion: v1
metadata:
  name: pv-nfs
spec:
  storageClassName: sc-cdn
  capacity:
    storage: 10Gi
  accessModes:
  - ReadOnlyMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    server: <IP address of NFS server>
    path: <Exposed path in the NFS server e.g. /mnt/nfs>

File: pv.yaml


Set up a NFS server that exposes a directory /clips/ containing *.mp4 video clips to be delivered, and creates a script /bin/start.sh with the following content to instruct the ffmpeg program to transcode the video clips in high (6000kb) and low (3000kb) bitrates:
 

#!/bin/bash

RTMP_SERVER=svc-nginx-rtmp
RTMP_ENDPOINT='live'
STREAM_NAME='my-video'
VARIANT_HI_MAXRATE=6000k
VARIANT_HI_VWIDTH=1920
VARIANT_LO_MAXRATE=3000k
VARIANT_LO_VWIDTH=1280

function streaming_file {
    local file=$1
    echo "Start streaming $file ..."

    local render_device=''
    local vendor_id=''
    for i in $(seq 128 255); do
        if [ -c "/dev/dri/renderD$i" ]; then
            vendor_id=$(cat "/sys/class/drm/renderD$i/device/vendor")
            if [ "$vendor_id" = '0x8086' ]; then
                # Intel GPU found
                render_device="/dev/dri/renderD$i"
                break
            fi
        fi
    done

    local o_hwaccel=''
    local o_audio='-map a:0 -c:a aac -ac 2'
    local o_decode='-c:v h264'
    local o_encode='-c:v libx264'
    local o_scaler='-vf scale'
    if [ "$render_device" != "" ]; then
        # Use hardware codec if available
        o_hwaccel="-hwaccel qsv -qsv_device $render_device"
        o_decode='-c:v h264_qsv'
        o_encode='-c:v h264_qsv'
        o_scaler='-vf scale_qsv'
    fi

    local o_variant_hi="-maxrate:v $VARIANT_HI_MAXRATE"
    local o_variant_lo="-maxrate:v $VARIANT_LO_MAXRATE"
    local width=''
    width=$(ffprobe -v error -select_streams v:0 -show_entries stream=width -of default=nw=1 "$file")
    width=${width/width=/}
    if [ "$width" -gt "$VARIANT_HI_VWIDTH" ]; then
        # Scale down to FHD for variant-HI
        o_variant_hi+=" $o_scaler=$VARIANT_HI_VWIDTH:-1"
    fi
    if [ "$width" -gt "$VARIANT_LO_VWIDTH" ]; then
        # Scale down to HD for variant-LO
        o_variant_lo+=" $o_scaler=$VARIANT_LO_VWIDTH:-1"
    fi

    eval ffmpeg -re "$o_hwaccel $o_decode -i $file" \
                -map v:0 "$o_variant_hi $o_encode $o_audio" \
                -f flv "rtmp://$RTMP_SERVER/$RTMP_ENDPOINT/$STREAM_NAME\_hi" \
                -map v:0 "$o_variant_lo $o_encode $o_audio" \
                -f flv "rtmp://$RTMP_SERVER/$RTMP_ENDPOINT/$STREAM_NAME\_lo"
}

ROOT=${BASH_SOURCE%/bin/*}
CLIPS=$ROOT/clips

while :
    do
        for clip in "$CLIPS"/*.mp4; do
            streaming_file "$clip"
        done
    done


Create the above YAML files, start the NFS server, and enter the IP address of the NFS server to the above persistent volume configuration. Then run the following commands to start the video transcoding job:

 

$ kubectl apply -f pv.yaml
$ kubectl apply -f transcoding.yaml


You could enter the following URL to any HLS supported client, such as the Safari browser or VLC Media Player to receive the live streams:

http://<IP_Address_of_NGINX_RTMP_Server>/hls/my-video.m3u8

 

Figure 2: Video player example (© copyright 2008, Blender Foundation / www.bigbuckbunny.org)

 

Summary


Typically, a Kubernetes cluster is made up of a variety of nodes, some of which may be installed with special hardware to accelerate specific applications. To expose those hardware features to containerized applications, Kubernetes device plugins are designed to selectively deploy the applications only to those nodes to leverage the installed hardware resources. This application note illustrates the use of Intel GPU device plugin for Kubernetes to deploy video transcoding services only to those Kubernetes nodes installed with Intel GPU cards to leverage the hardware codecs for acceleration, and also save the CPU cycles for other workloads. For instance, the previous script conducts one video decoding, two video scaling, and two video encoding operations to transcode videos to two streams at different bitrates. The following figures indicate the consumed CPU cycles can be significantly reduced to 21% from 1007%, if the transcoding job is deployed to a node with GPU hardware resources.

Figure 3: CPU utilization of running video transcoding job on traditional worker node (© copyright 2008, Blender Foundation / www.bigbuckbunny.org)

 

Figure 4: CPU utilization of running video transcoding job on GPU capable worker node (© copyright 2008, Blender Foundation / www.bigbuckbunny.org)

 

 

References

1. Cloud Native Computing Foundation (https://www.cncf.io/)

2. Kubernetes - Production-Grade Container Orchestration (https://kubernetes.io/)

3. Intel Device Plugins for Kubernetes (https://github.com/intel/intel-device-plugins-for-kubernetes)

4. NGINX-based Media Streaming Server (https://github.com/arut/nginx-rtmp-module)

5. Intel Open Visual Cloud (https://www.intel.com/content/www/us/en/developer/articles/technical/open-visual-cloud.html)

6. Intel® Developer Catalog (Intel® DevCatalog) (https://www.intel.com/content/www/us/en/developer/tools/software-catalog/about.html)

7. Intel Media Reference Stack (https://www.intel.com/content/www/us/en/developer/articles/containers/media-reference-stack-on-ubuntu.html)