Tutorial: Using Auto Vectorization with Intel® Fortran Compiler

ID 758497
Date 4/11/2022

Additional Exercises

The previous examples made use of double precision arrays. They may be built instead with single precision arrays by changing the command-line option -real-size 64 to -real-size 32. The non-vectorized versions of the loop execute only slightly faster the double precision version; however, the vectorized versions are substantially faster. This is because a packed SIMD instruction operating on a 16-byte vector register operates on four single precision data elements at once instead of two double precision data elements.


In the example with data alignment, you will need to set ROWBUF=3 to ensure 16-byte alignment for each row of the matrix a. Otherwise, the directive !dir$ vector aligned will cause the program to fail.

This completes the tutorial that shows how the compiler can optimize performance with various vectorization techniques.

Did you find the information on this page useful?

Characters remaining:

Feedback Message