Migrating the HSOpticalFlow Estimation from CUDA* to SYCL*

ID 766332
Updated 10/23/2023
Version Latest




The HSOptical Flow sample demonstrates computation of per-pixel motion estimation between two consecutive image frames caused by movement of object or camera. This sample includes the migration of CUDA Texture memory object and API calls such as cudaResourceDesc, cudaTextureDesc, cudaCreateTextureObject() call to SYCL equivalent and conversion of single channel image format to 4-channel image format as SYCL images support only 4-channel image format. The original CUDA* source code is migrated to SYCL for portability across GPUs from multiple vendors.




What you will learn

Migrate and optimize HSOptical Flow sample from CUDA to SYCL.

Time to complete

15 minutes


Concepts and Functionality


Key Implementation Details

The HSOptical Flow sample provides a simple implementation of the Horn-Schnuck method for estimating optical flow. In doing so, it demonstrates how a partial differential equation (PDE) solver can be accelerated via GPU offload.

It includes both serial and parallel implementation of the algorithm, which allows for direct results comparison between CPU and Device. Input images of the sample are computed to get the sum of all the absolute difference (L1 error value) between serial and parallel computation.

The parallel implementation computation involves the following stages: Image downscaling and upscaling, Image warping, Computing derivatives, and Computation of Jacobi iteration.

  1. Image Scaling downscaling or upscaling aims to preserve the visual appearance of the original image when it is resized, without changing the amount of data in the original image.
  2. Image Warping is a transformation that maps all positions in the source image plane to positions in a destination plane.
  3. Computing Derivatives determines temporal and spatial derivatives of a given image.
  4. Solving for Jacobi iterations is the final stage. Boundary conditions are explicitly handled within the kernel. The number of iterations is fixed during computations.

In CUDA texture memory is used to read and update image data. The equivalent approach in SYCL uses image memory where image objects represent a region of memory managed by the SYCL runtime. The data layout of the image memory is deliberately unspecified to allow implementations to provide a layout optimal to a given device.