Code Migration: Before & After

Source CUDA Code

The Intel DPC++ Compatibility Tool migrates software programs implemented with current and previous versions of CUDA. For details, see the release notes.

#include <cuda.h>
#include <stdio.h>

const int vector_size = 256;

__global__ void SimpleAddKernel(float *A, int offset) 
{
  A[threadIdx.x] = threadIdx.x + offset;
 
}int main() 
{
  float *d_A;
  int offset = 10000;

  cudaMalloc( &d_A, vector_size * sizeof( float ) );
  SimpleAddKernel<<<1, vector_size>>>(d_A, offset);

  float result[vector_size] = { };
  cudaMemcpy(result, d_A, vector_size*sizeof(float), cudaMemcpyDeviceToHost);

  cudaFree( d_A );
   
  for (int i = 0; i < vector_size; ++i) {
    if (i % 8 == 0) printf( "\n" );
    printf( "%.1f ", result[i] );
  }

  return 0;
}

Migrated Code

This resulting code is typical of what you can expect to see after code is ported. In most cases, code edits and optimizations will be required to complete the code migration.

1 An Intel estimate as of September 2024, which is based on measurements from a set of 100 HPC benchmarks, AI applications, and samples, with examples like GROMACS, llama.cpp, and SqueezeLLM. Results may vary.