Build and Configure Application
Get Software Tools
- Intel® Parallel Studio XE Cluster Edition, including Intel® C++ Compiler, Intel® MPI Library, Intel® Trace Analyzer and Collector, andIntel® VTune™Profiler
- Clone the application GitHub* repository to your local system:$ git clone https://github.com/CardiacDemo/Cardiac_demo.git
- Set up the Intel C++ compiler environment:$ source<compiler_installdir>/bin/compilervars.sh intel64By default,<compiler_installdir>is/opt/intel/compilers_and_libraries_.<version>.<update>.<package#>/linux
- In the root level of the sample package, create abuilddirectory and change to that directory:$ mkdir build$ cd build
- Build the application using the following command:$ mpiicpc ../heart_demo.cpp ../luo_rudy_1991.cpp ../rcm.cpp ../mesh.cpp -g -o heart_demo -O3 -std=c++11 -qopenmp -parallel-source-info=2
Run Application with Various Configurations
- 128 MPI processes, 1 OpenMP thread
- 32 MPI processes, 4 OpenMP threads
- 2 MPI processes, 64 OpenMP threads
- Set up the environment for the Intel MPI Library:$ source<impi_installdir>/intel64/bin/mpivars.shWhere<impi_installdir>is the installed location for Intel MPI Library (default location is/opt/intel/compilers_and_libraries_).<version>.<update>.<package#>/linux/mpi
- Create a host file that lists all of the cluster nodes involved:node1 node2 ... node8
- Save the file ashosts.txtin thebuilddirectory.
- In thebuilddirectory, run the application with each of the three combinations. Use the time utility to measure the application elapsed time. The computation time is calculated by the application internally.# 128/1 $ cat > run_ppn128_omp1.sh export OMP_NUM_THREADS=1 mpirun -n 1024 -ppn 128 -f hosts.txt ./heart_demo -m ../mesh_mid -s ../setup_mid.txt -t 50 $ time ./run_ppn128_omp1.sh# 32/4 $ cat > run_ppn32_omp4.sh export OMP_NUM_THREADS=4 mpirun -n 256 -ppn 32 -f hosts.txt ./heart_demo -m ../mesh_mid -s ../setup_mid.txt -t 50 $ time ./run_ppn32_omp4.sh# 2/64 $ cat > run_ppn2_omp64.sh export OMP_NUM_THREADS=64 mpirun -n 16 -ppn 2 -f hosts.txt ./heart_demo -m ../mesh_mid -s ../setup_mid.txt -t 50 $ time ./run_ppn2_omp64.sh
- Review and save the computation and elapsed time values. The values are found in the last lines of the application output (computation time and elapsed time respectively):...wall time: <value>real <value>...
- The first combination uses only MPI parallelism, so its performance is considerably worse than those utilizing MPI and OpenMP. It is not worth investigating further.
- The second combination is a middle-point: the times are significantly better, but still not perfect. This may be due to an un-optimized MPI communication pattern in the application.
- The third combination shows the best performance, so it is reasonable to focus on this one for further optimizations.
- Using only one method of parallelism is inefficient. Using both MPI and OpenMP parallelism at once can give a significant performance boost.
- Test various combinations of MPI processes and OpenMP threads for your hybrid application. Different combinations can produce very different performance results for the same application.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804