Improving Performance by Aligning Data
The vectorizer can generate faster code when operating on aligned data. In this activity you will improve the vectorizer performance by aligning the arrays a, b, and c in driver.f90 on a 16-byte boundary so the vectorizer can use aligned load instructions for all arrays rather than the slower unaligned load instructions and can avoid runtime tests of alignment. Using the ALIGNED macro will insert an alignment directive for a, b, and c in driver.f90 with the following syntax:
!dir$attributes align : 16 :: a,b,c
This instructs the compiler to create arrays that it are aligned on a 16-byte boundary, which should facilitate the use of SSE aligned load instructions.
In addition, the column height of the matrix a needs to be padded out to be a multiple of 16 bytes, so that each individual column of a maintains the same 16-byte alignment. In practice, maintaining a constant alignment between columns is much more important than aligning the start of the arrays.
To derive the maximum benefit from this alignment, we also need to tell the vectorizer it can safely assume that the arrays in matvec.f90 are aligned by using the directive
!dir$ vector aligned
If you use !dir$ vector aligned, you must be sure that all the arrays or subarrays in the loop are 16-byte aligned. Otherwise, you may get a runtime error. Aligning data may still give a performance benefit even if !dir$ vector aligned is not used. See the code under the ALIGNED macro in matvec.f90
If your compilation targets the Intel® AVX instruction set, you should try to align data on a 32-byte boundary. This may result in improved performance. In this case, !dir$ vector aligned advises the compiler that the data is 32-byte aligned.
Rebuild the program after adding the ALIGNED preprocessor definition to ensure consistently aligned data:
Fortran > Preprocessor > Preprocessor Definitions
Rebuild your project. The report will now reflect aligned access.
Did you find the information on this page useful?