Code Sample: Parallel Processing with Direct3D* 12

Published: 04/04/2018  

Last Updated: 04/04/2018


License: Intel Sample Source Code License Agreement
Optimized for...  
Operating System: Windows® 10 (64 bit)
Hardware: GPU required
(Programming Language, tool, IDE, Framework)
Microsoft Visual Studio* 2017, Direct3D* 12, C++
Prerequisites: Familiarity with Visual Studio, Direct3D API, 3D graphics, parallel processing.


The idea behind this project was to provide a demonstration of parallel processing in gaming with Direct3D 12. It expands upon the results from the paper "A Comparison of the Intel® Core™ i5 Processor and Intel® Core™ i7 Processor with Visualizations in OpenGL* and Oculus* VR" (see References section) and extends the code there to contain a Direct3D 12 renderer. It also re-implements the previous particle system as a Direct3D 12 compute shader.

  1. Modify code to add a CPU Direct3D 12 renderer
  2. Moving to the GPU
  3. Closer look at differences between CPU and GPU

Get Started

Modify code to add a CPU Direct3D* 12 renderer

The first task is to add a Direct 3D 12 "renderer" to the particle system used in the Intel Core i5 vs Intel Core i7 article. The software design makes this very easy to do since it very nicely encapsulates the concept of rendering. The first step is to define an interface to the renderer and then write an event loop. To improve performance, I wrote a custom upload heap. Next I looked at the compute shader and the actual Direct3D 12 rendering code, before discussing issues surrounding the vertex buffer view.

Moving to the GPU

We can improve performance by moving the renderer from the CPU to the GPU. Better than separating processing between separate threads, is to separate it between multiple processors: CPU and GPU.

The first thing I considered after struggling with getting every single field in every single structure in the CPU portion to be correct and consistent, was facing the prospect of two to three times more work for the GPU compute problem. I quickly decided the right thing to do was to look for some kind of "helper" framework and, thus, chose MiniEngine: A DirectX 12 Engine Starter Kit. I will cover how I installed and customized MiniEngine for this project. Through the use of MiniEngine, the 500+ lines of code to render using the CPU is reduced to about 38 lines of setup code and 31 lines of rendering code (69 lines total) for the GPU, so my work paid off.

The GPU renderer consists of setup and rendering code. Setup contains configuring the root signature, vertex inputs and obtain the formats used for color and depth. Finally, I configure the graphic PSO and the view and production matrices. Rendering code is broken into obtaining the context, describing the transitions, clearing the color and depth before updating the matrix with the new values and then drawing the frame.

Closer look at differences between CPU and GPU

For best performance I use two buffers of particle data and render one while the other is being updated for the next frame by GPU compute. I briefly talk about this before taking a deeper look at changes required to implement a particle rendering system on the GPU, in particular, differences between the algorithms.


Parallel Processing with DirectX 3D* 12


John Stone, Integrated Computing Solutions, Inc., A Comparison of the Intel® Core™ i5 Processor and Intel® Core™ i7 Processor with Visualizations in OpenGL* and Oculus* VR, 2017

Updated Log

Created March 20, 2018

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at