Visible to Intel only — GUID: ewa1403528651182
Ixiasoft
Visible to Intel only — GUID: ewa1403528651182
Ixiasoft
5.8.2. Low Occupancy Percentage
Consider the following code example:
__kernel void proc (__global int * a, ...) {
for (int i = 0; i < N; i++) {
for (int j = 0; j < 1000; j++) {
write_channel_intel (c0, data0);
}
for (int k = 0; k < 3; k++) {
write_channel_intel (c1, data1);
}
}
}
Assuming all the loops are pipelined, the first inner loop with a trip count of 1000 is the critical loop. The second inner loop with a trip count of three is executed infrequently. As a result, you can expect that the occupancy and activity percentages for channel c0 are high and for channel c1 are low.
Also, occupancy percentage might be low if you define a small work-group size, the kernel might not receive sufficient work-items. This is problematic because the pipeline is empty generally for the duration of kernel execution, which leads to poor performance.