Granularity
For all memory address spaces, to optimize performance, a kernel must access data in at least 32-bit quantities, from addresses that are aligned to 32-bit boundaries. A 32-bit quantity can consist of any type, for example:
- char4s
- ushort2s
- ints
These data types can be accessed with identical memory performance. If possible, access up to four 32-bit quantities (
float4
, int4
, etc) at a time to improve performance. Accessing more than four 32-bit quantities at a time may reduce performance.