Parallel Noise and Random Functions for OpenCL™ Kernels

Published: 10/02/2015  

Last Updated: 10/02/2015

Download Now

About the Sample

The Noise sample code associated with this paper includes an implementation of Perlin noise, which is useful for generating natural-looking textures, such as marble and clouds, for 3D graphics. A test that uses Perlin noise to generate a “cloud” image is included. (See the References section for more information on Perlin noise.) 2D and 3D versions are included, meaning that the functions take two or three inputs to generate a single Perlin noise output value.

The Noise sample also includes pseudo-random number generator (RNG) functions that yield fairly good results—sufficient that a generated image visually appears random. 1D, 2D, and 3D versions are included, again referring to the number of inputs to generate a single pseudo-random value.

Introduction and Motivation

Many applications require a degree of “randomness” — or actually, “pseudo-randomness.” That is, a series of values that would appear random or “noisy” to a human. However, for repeatability, applications also commonly require that the RNG be able to reliably generate exactly the same sequence of values, given the same input “seed” value or values.

Most RNG algorithms meet these requirements by making each generated value depend on the previous generated value, with the first value in the sequence generated directly from the seed value. That approach to RNG is problematic for highly parallel processing languages such as OpenCL. Forcing each of the many processing threads to wait on a single sequential RNG source would reduce or eliminate the parallelism of algorithms using it.

One approach to dealing with this issue is to pre-generate a large table of random values, with each of the parallel threads generating unique but deterministic indices into that table. For example, an OpenCL kernel processing an image might select an entry from the pre-generated table by calculating an index based upon the pixel coordinates that kernel is processing or generating.

However, that approach requires a potentially time-consuming serial RNG process before the parallel algorithm can begin—limiting performance improvements due to parallelism. It also requires that the number of random numbers to be used be known at least approximately, in advance of running the parallel algorithm. That could be problematic for parallel algorithms that need to dynamically determine how many random values will be used by each thread.

The OpenCL kernel-level functions in the Noise sample code associated with this paper takes an approach more suitable for the OpenCL approach to dividing work into parallel operations.

Noise and Random Number Generation for OpenCL

OpenCL defines a global workspace (array of work items) with one, two, or three dimensions. Each work item in that global space has a unique set of identifying integer values corresponding to the x, y, and z coordinates in the global space.

The Perlin noise and RNG functions in the Noise sample generate a random number or noise sequence based on up to three input values, which can be the global IDs for each work item. Alternatively, one or more of the values might be generated by a combination of the global IDs and some data value obtained or generated by the kernel.

For example, the following OpenCL kernel code fragment shows generation of random numbers based on the 2D global ID of the work item.

kernel void	genRand()
	uint	x = get_global_id(0);
	uint	y = get_global_id(1);

	uint	rand_num = ParallelRNG2( x, y );


Figure 1. Example of random number use - two dimensions.


This approach allows for random or noise functions to run in parallel between work items, yet generate results that have a repeatable sequence of values that are “noisy” both between work items and sequentially within a work item. If multiple 2D sets of values need to be generated, the 3D generation functions can be used, with the first two inputs generated based upon the work item’s global ID, and the 3rd dimension generated by sequentially increasing some starting value for each additional value required. This could be extended to provide multiple sets of 3D random or noise values, as in the following example for Perlin noise:

kernel void multi2dNoise( float fScale, float offset )
float	fX = fScale * get_global_id(0);
float	fY = fScale * get_global_id(1);
float	fZ = offset;

float	randResult = Noise_3d(  fX,  fY,  fZ );

Figure 2. Example of Perlin noise use - three dimensions.



The Noise_2d and Noise_3d functions follow the same basic Perlin noise algorithm but differ in implementation based on Perlin’s recommendations. (See reference 1.) In the Noise sample, only Noise_3d is exercised to implement the noise example, but a test kernel for Noise_2d is included in for the reader who wants to modify the sample to test that variation.

The Noise_2d and Noise_3d functions should be called with floating point input values. Values should span a range, such as (0.0, 128.0), to set the size of the “grid” (see Figure 3) of randomized values. Readers should look at the clouds example to understand how Perlin noise can be transformed into various “natural looking” images.

The default ParallelRNG function used in the random test provides visually random results but is not the fastest RNG algorithm. This function is based on the “Wang hash,” which was not designed for use as an RNG. However, some commonly used RNG functions (a commented out example is included in the file) showed visible regularities when filling a 2D image, particularly in the lower order bits of results. The reader may want to experiment with other, faster RNG functions.

The default ParallelRNG function generates only unsigned 32 bit integer results—if floating point values on a range such as (0.0, 1.0) are needed, the application must apply a mapping to that range. The random example maps the random unsigned integer result to the range (0, 255) to generate gray scale pixel values, simply using an AND binary operation to select 8 bits.

The default ParallelRNG function will not generate all 4,294,967,296 (2^32) unsigned integer values for sequential calls using the previously generated value. For any single starting seed value the pseudo-random sequences/cycles range from at least as small as 7,000 unique values to about 2 billion values long. There are around 20 different cycles generated by the default ParallelRNG function. The author believes it will be uncommon that any work item of an OpenCL kernel will require more sequentially generated random numbers than the smallest cycle can provide.

The 2D and 3D versions of the function—ParallelRNG2 and ParallelRNG3—use a “mixing” of cycles by applying an XOR binary operation between the result of a previous call to ParallelRNG and the next input value, which will change the cycle lengths. However, that altered behavior has not been characterized in detail, so it is recommended that the reader carefully validate that the ParallelRNG functions meet the needs of their application.

Project Structure

This section lists only the key elements of the sample application source code.


Main entry point function. After parsing command-line options, it initializes OpenCL, builds the OpenCL kernel program from the file, prepares one of the kernels to be run, and calls ExecuteNoiseKernel(), then ExecuteNoiseReference(). After validating that the two implementations produce the same results, main() prints out the timing information each returned and stores the resulting images from each.

Set up and run the selected Noise kernel with OpenCL.

Set up and run the selected Noise reference C code.

Table of random values 0—255 for 3D Perlin noise kernel. Note that this could be generated and passed to the Perlin noise kernel, for an added degree of randomness.

16 uniformly spaced unit vectors, gradients for 2D Perlin noise kernel.

16 vector gradients for 3D Perlin noise kernel.

Pseudo-Random Number Generator, one pass over 1 input. An alternative RNG function is commented out, in case the reader wants to test a faster function that yields poorer results.

RNG doing 2 passes for 2 inputs

RNG doing 3 passes for 3 inputs

weight_poly3() and weight_poly5() and WEIGHT()
These are alternative weight functions used by Perlin noise, to insure continuous gradients everywhere. The second (preferred) function allows continuous 2nd derivative everywhere as well. The WEIGHT macro selects which is used.

Macro converting range (0, 255) to (-1.0, 1.0)

Bilinear interpolation using an OpenCL built

Selects a gradient and does dot product with input xy, part of Perlin Noise_2d function.

Perlin noise generator with 2 inputs.

Selects a gradient and does dot product with input xyz, part of Perlin Noise_3d function.

Perlin noise generator with 3 inputs.

Generates one pixel of a “cloud” output image for CloudTest using Noise_3d .

Converts from the Perlin noise output range (-1.0, 1.0)to the range (0, 255) needed for gray scale pixels.

The cloud image generation test. The slice parameter is passed to cloud, to allow the host code to generate alternative cloud images.

Test of Noise_2d – not used by default.

Test of Noise_3d – default Perlin noise function. Uses map256 to generate pixel values for a grayscale image.

Test of ParallelRNG3, currently uses the low order byte of unsigned integer result to output a grayscale image.

Two Microsoft Visual Studio solution files are provided, for Visual Studio versions 2012 and 2013.  These are “Noise_2012.sln” and “Noise_2013.sln”.   If the reader has a newer version of Visual Studio, it should be possible to use the Visual Studio solution/project update to create a new solution derived from these.

Note that the solutions both assume that the Intel® OpenCL™ Code Builder has been installed.

Controlling the Sample

This sample can be run from a Microsoft Windows* command-line console, from a folder that contains the EXE file:

Noise.exe < Options >


-h or --help
Show command-line help. Does not run any of the demos.

-t or --type [ all | cpu | gpu | acc | default | <OpenCL constant for device type>
Select the device to run the OpenCL kernel upon by type of device. Default value: all

<OpenCL constant for device type>


-p or --platform < number-or-string > 
Selects platform to use. A list of all platform numbers and names is printed when a demo is run. The platform being used will have “[Selected]” printed to the right of it. If using string, provide enough letters of the platform name to uniquely identify it. Default value: Intel

-d or --device < number-or-string >
Select the device to run the OpenCL kernels upon by device number or name. Device numbers and names on the platform being used are printed when a demo is run. The current device will have “[Selected]” printed to the right of it. Default value: 0

-r or --run [ random | perlin | clouds ]
Select the function demonstration to run. Random number, perlin noise, or cloud image generators each have demo kernels. Default value: random

-s or --seed < integer >
Provide an integer value to vary the algorithm output. Default value: 1

Noise.exe prints the time the OpenCL kernel and reference C-coded equivalent each take to run, as well as the names of the respective output files for each. When the program has finished printing information, it waits for the user to press ENTER before exiting. Please note that no attempt was made to optimize performance of the C-coded reference code functions; they are intended only to validate correctness of the OpenCL kernel code.

Examining Results

After a Noise.exe run is complete, examine the generated BMP format image files OutputOpenCL.bmp and OutputReference.bmp in the working folder, to compare the OpenCL and C++ code results, respectively. The two images should be identical, though it is possible that there might be very small differences between the two Perlin noise or cloud images.

The (Perlin) noise output should appear similar to Figure 3:

Figure 3. Perlin noise output.

The random output should look similar to Figure 4:

Figure 4. Random noise output.

The clouds function output should look similar to Figure 5 :

Figure 5. Generated cloud output.


  1. Perlin, K., “Improving Noise,”
  2. “4-byte Integer Hashing,”
  3. Overton, M. A., “Fast, High-Quality, Parallel Random Number Generators,” Dr. Dobb’s website (2011).
  4. Intel® Digital Random Number Generator (DRNG) Library Implementation and Uses,
  5. Intel Sample Source Code License Agreement,
  6. Intel® OpenCL™ Code Builder,


Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at