5.2. Parallelize Loops
You can take advantage of the spatial compute structure to accelerate the loops by having multiple iterations of a loop executing concurrently. To have multiple iterations of a loop execute concurrently, unroll loops when possible and structure your loops so that dependencies between loop iterations are minimized and can be resolved within one clock cycle.
These practices show how to parallelize different iterations of the same loop. If you have two different loops that you want to parallelize, consider using a system of tasks. For details, see System of Tasks Best Practices.