Developer Guide and Reference

  • 2022.1
  • 04/11/2022
  • Public Content

DPCT1086

Message

__activemask()
is migrated to
0xffffffff
. You may need to adjust the code.

Detailed Help

There is currently no functional equivalent of
__activemask()
in SYCL*. If there is flow control in your code that will make the thread inactive, you need to rewrite the thread logic.
For example, this original CUDA* code:
__device__ inline int SHFL_SYNC(unsigned mask, int val, unsigned offset, unsigned w = warpSize) { return __shfl_down_sync(mask, val, offset, w); } __global__ void kernel(int *array) { unsigned int tid = threadIdx.x; if (tid >= 8) return; unsigned mask = __activemask(); array[tid] = SHFL_SYNC(mask, array[tid], 4); }
results in the following migrated SYCL code:
inline int SHFL_SYNC(unsigned mask, int val, unsigned offset, sycl::nd_item<3> item_ct1, unsigned w = 0) { if (!w) w = item_ct1.get_sub_group().get_local_range().get(0); // This call will wait for all work-items to arrive which will never happen since only work-items with tid < 8 will encounter this call. return sycl::shift_group_left(item_ct1.get_sub_group(), val, offset); } void kernel(int *array, sycl::nd_item<3> item_ct1) { unsigned int tid = item_ct1.get_local_id(2); if (tid >= 8) return; /* DPCT1086 */ unsigned mask = 0xffffffff; array[tid] = SHFL_SYNC(mask, array[tid], 4, item_ct1); }
which is rewritten to:
// remove mask parameter, as it is not used inline int SHFL_SYNC(int val, unsigned offset, sycl::nd_item<3> item_ct1, unsigned w = 0) { if (!w) w = item_ct1.get_sub_group().get_local_range().get(0); unsigned int tid = item_ct1.get_local_id(2); // Use a temporary variable to save the result of sycl::shift_group_left() to make sure all work-items can encounter this call. int v_tmp = sycl::shift_group_left(item_ct1.get_sub_group(), val, offset); return (tid < 8) ? v_tmp : val; } void kernel(int *array, sycl::nd_item<3> item_ct1) { unsigned int tid = item_ct1.get_local_id(2); // remove mask parameter, as it is not used array[tid] = SHFL_SYNC(array[tid], 4, item_ct1); }

Suggestions to Fix

Check if
0xffffffff
can be used instead of
__activemask()
. If it cannot be used, redesign the thread logic.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.