Distributed Memory Parallel Programs
Programs using distributed-memory parallelism can run on multiple nodes. They consist of independent processes that communicate through a library, usually the Message Passing Interface MPI.
Building and Running MPI Programs
MPI is an external library. It must be built for the compiler with which it will be used. We provide compiled versions of MPI for gcc and NVHPC. For Intel we recommend using the vendor’s IntelMPI.
OpenMPI with gcc and NVHPC
module load gcc
Or
module load nvhpc
Follow this with
module load openmpi
Compile and link your code with the wrappers
mpiccfor Cmpicxxfor C++mpif90for Fortran
The wrappers insert the appropriate header/module paths and library paths, and also the correct libraries to link. You do not add them explicitly.
IntelMPI with Intel
module load intel
Use the wrappers
mpiicxfor Cmpiicpxfor C++mpiifxfor Fortran
Running MPI Programs
MPI programs must be run under the control of an executor program. Usually this is called mpirun or mpiexec, but when submitting to SLURM we must use mpirun for OpenMPI and srun for IntelMPI. The executor must be told how many processes to start, and if on more than one host the list of host IDs must be provided. On a local system this is controlled by command-line options to mpirun, e.g.
mpirun -np 8 ./mympicode
However, on Slurm the provided MPI executors obtain the number of processes and the hostlist directly from the job assignments. Do not specify them on the command line.
#Intel
srun ./mympicode
#OpenMPI
mpirun ./mympicode
To request resources through Slurm for an MPI program, use the following directives
#SBATCH -N NN
#SBATCH --ntasks-per-node=NT
where NN is the number of nodes and NT is the number of tasks (processes) to run on each node. The total number of processes will be NN*NT.
Exercise
Copy the file mpihello.c or mpihello.f90 from /share/resources/tutorials/compilers. C++ programmers please note: the C++ bindings are deprecated in MPI, we just use the C routines.
Load the appropriate module for the compiler you are using.
Build the code with the appropriate wrapper.
You can run a quick test on the frontend with 4 cores. You do not need a hostfile if all processes are on one node.
Write a SLURM script using srun to run the program. Submit to the parallel partition for testing, using at least two nodes.