# Micro-Cluster Setup with Intel® MPI Library for Windows*

Published: 08/16/2017

Last Updated: 08/15/2017

## Objective

Demonstrate the steps involved in setting up the smallest Windows* cluster with Intel® MPI Library to users experimenting in their labs. We have seen some of our customers involved in such activity.

## Specifications

 OS family Microsoft Windows* OS Name Microsoft Windows Server 2016 Standard OS Version 10.0.14393 Build 14393 Processor Intel Core™ i5-6200U CPU @ 2.3 GHz, 2401 MHz, 2 cores, 4 logical processors Network adapter Ethernet (Realtek* Semiconductor PCIe FE Family Controller) Software suite Intel Parallel Studio XE 2017 Cluster Edition (update 4) Compiler Intel C Compiler 2017 update 4 MPI library Intel MPI Library 2017 update 3

## Background

The Message Passing Interface (MPI) standard has become the de facto standard of choice for use in large machines like clusters and supercomputers. This standard mandates how machines in a distributed memory framework must communicate. Intel MPI Library is an implementation of the MPI standard and is fully compliant with MPI 3.1 (latest MPI standard at the time of publishing this article). Intel MPI Library 2017 update 3 is used for the various micro cluster tests in this article. For more details on Intel MPI Library, please click here.

## Introduction

In the context of this article, we refer to two laptops connected together via a RJ45 (ethernet) cable, as a micro cluster. Although this is far from the hierarchies one would typically encounter in a traditional cluster (nodes - edge switches - core switches - director switch), it presents a comprehensive set of steps that also need to be performed on bigger machines. This article considers the use of a Windows* operating system.

## Steps

1. Connect two laptops running Windows OS (2016 server edition, in this case) with a RJ45 cable. It is possible to connect laptops running dissimilar versions of Windows.
2. Both the laptops must have the same username and password. The domain names may be different.
3. Install the same version of Intel Parallel Studio XE Cluster Edition on both laptops (installation guide). As a minimum, to just run MPI codes, one just needs the Intel MPI Library runtime (available here) installed on both machines. However, in order to also compile MPI codes on these machines, one would need at least Intel Parallel Studio XE Composer edition and the standalone Intel MPI Library.
4. The default installation folder is "C:\Program Files (x86)\IntelSWTools\mpi\2017.3.210\". Execute mpivars.bat, which is placed in "./intel64/bin".
5. Use the following commands to determine the user names and domain names of the laptops,
> whoami
domain-name\user-name

6. Next, invoke the following commands to set up the Hydra process manager
> hydra_service -start
> mpiexec -register
domain-name\user-name
> mpiexec -validate
SUCCESS

7. A working directory (with the same name and path) must be selected on both machines (for example "C:\data"). On one of the laptops (say Node1), "C:\data" must be shared and made accessible to the other laptop (say Node2). Use the share with 'Everyone' option in the file sharing dialogue on Node1. For more details on creating a shared folder please refer to this link. Once "C:\data" on Node1 is shared with Node2, it will be visible in the 'Network' folder of Node2 as "\\Node1\data". However, since the directory structure in both laptops needs to be identical, "\\Node1\data" must be made to appear as "C:\data" on Node2 and also the contents of "\\Node1\data" (on Node2) must replicate in "C:\data" of Node2. In order to achieve this, a symbolic link to "\\Node1\data" on Node2 is created using the following command,
> mklink /d "C:\data" "\\Node1\data" 

8. The executables, hostfile (and input files, if any) must be placed in this directory. The hostfile must contain the IP addresses (or domain-name\user-name) of both laptops. It must also be ensured that static IP addresses are assigned to both laptops. Use the 'ipconfig' command and look for IPv4 address of the Ethernet adapter to find the IP address of a machine.
9. Step 7 may be skipped if file IO operations like inter-node IO, MPI-IO, etc. are not being performed in the MPI code. A sample code called test_read.f90 has been attached (look for attachments section at the bottom of this page) in order to demonstrate the importance of step 7. In the absence of step 7, test_read.f90 fails.
10. The firewall must be disabled for guest/public networks. If not, the following error may come up,
> ..\hydra\utils\sock\sock.c (270): unable to connect from "Node1" to "Node2" (No error)

The above steps ensure that once a binary built using Intel MPI Library is placed in the shared folder (of step 7) and invoked suitably (as shown in the next section), it would simultaneously execute on both laptops as mandated by the MPI standard.

## Validation

The following synthetic codes will be used to test the micro cluster setup,

1. Hello world (test.c)
This source code is shipped with Intel® MPI Library and is placed in %I_MPI_ROOT%\test folder. C++, Fortran 77 and Fortran 90 versions of this code are also available in the same folder. Please note that the %I_MPI_ROOT% environment variable gets updated only after mpivars.bat is executed. Please refer to the following, for setup, compilation and execution steps.
# Set (and check) Intel C++ Compiler
> C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.3.xxx\windows\bin\ compilervars.bat intel64
> icl
Intel(R) C++ Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 16.0.2.180 Build 20160204

# Set (and check) Intel MPI Library
> C:\Program Files (x86)\IntelSWTools\mpi\2017.3.210\intel64\bin\mpivars.bat
> mpiicc -v

# Compile
> mpiicc test.c -o test

# Run
> mpiexec -hostfile hostfile -n 2 -ppn 1 test
Hello World: Rank 1 of 2 on Node1
Hello World: Rank 2 of 2 on Node2
2. Intel MPI Benchmarks

Intel MPI Benchmarks are a collection of tests that make performance measurements for point-to-point and collective communication routines (as defined by the MPI standard) for a range of message sizes. Please refer to this link for more information on Intel MPI Benchmarks. Prebuilt binaries are included in %I_MPI_ROOT%\intel64\bin. They may be invoked as,

> mpiexec -n <x> IMB-<component> [argument/s]

For example,

> mpiexec -n 2 IMB-MPI1
> mpiexec -n 2 IMB-MPI1 pingpong
> mpiexec -n 2 IMB-MPI1 pingpong reduce

In this case, performance degradation may be observed in the multi-node case (results from Intel MPI Benchmarks are not presented here). This is because inter-node communications are inherently slower than intra-node communications. In order to speed-up intra-node communications, high-speed fabrics may be used. Also, efficient domain decomposition algorithms may be used to localize data. This helps by limiting MPI traffic on the slower network. The use of MPI is still valuable in light of the above since MPI enables bigger and higher resolution models to be solved in acceptable lengths of time, by realizing parallel execution on several discrete machines.

## NOTE

The premise of this article didn't mandate considerations about the underlying network and the knobs for selection of fabrics and providers within the Intel MPI Library. For information on such topics please click here.

* Other names and brands may be claimed as the property of others.

#### Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.