# Available MPI versions (and comparison)
The cluster has OpenMPI installed.
The recommendation is to use **OpenMPI** ***(except you really know what you are doing)!!!***
MPICH does ***not*** support InfiniBand. MVAPICH is ***not*** integrated with SLURM, you need to create the hostfile yourself from the slurm-nodelist.
On all nodes:
module load mpi/openmpi-x86_64
OpenMPI will choose the fastest interface, it will try RDMA over Ethernet (RoCE) which causes _"[qelr_create_qp:683]create qp: failed on ibv_cmd_create_qp"_ messages, these can be ignored, it will fail over to IB (higher bandwidth anyway) or TCP.
For MPI jobs prefer the **green-ib** partition (`#SBATCH -p green-ib`) or stay within a single node (`#SBATCH -N 1`).
mpirun --mca btl_openib_warn_no_device_params_found 0 ./hello-mpi
## Layers in OpenMPI
---
- PML = Point-to-point Management Layer:
- UCX
- MTL = Message Transfer Layer:
- PSM,
- PSM2,
- OFI
- BTL = Byte Transfer Layer:
- TCP,
- openib
Layers can be selected with the `--mca` option of `mpirun`:
To select TCP transport:
mpirun --mca btl tcp,self,vader
To select RDMA transport (verbs):
mpirun --mca btl openib,self,vader
To select UCX transport:
mpirun --mca pml ucx
***NB!*** _UCX is not supported on QLogic FastLinQ QL41000 Ethernet controllers._
For further explanations and details see:
-
-
## Different MPI implementations exist:
---
- OpenMPI
- MPICH
- MVAPICH
- IBM Platform MPI (MPICH descendant)
- IBM Spectrum MPI (OpenMPI descendant)
- (at least one for each network and CPU manufacturer)
### OpenMPI
- available in any Linux or BSD distribution
- combining technologies and resources from several other projects (incl. LAM/MPI)
- can use TCP/IP, shared memory, Myrinet, Infiniband and other low latency interconnects
- chooses fastest interconnect automatically (can be manually choosen, too)
- well integrated into many schedulers (e.g. SLURM)
- highly optimized
- FOSS (BSD license)
### MPICH
- highly optimized
- supports TCP/IP and some low latency interconnects
- (older versions) DO NOT support InfiniBand (however, it supports MELLANOX IB)
- available in many Linux distributions
- ? not intgrated into schedulers
- used to be a PITA to get working smoothly
- FOSS
### MVAPICH
- highly optimized (maybe slightly faster than OpenMPI)
- fork of MPICH to support IB
- comes in many flavors to support TCP/IP, InfiniBand and many low latency interconnects: OpenSHMEM, PGAS
- need to install several flavors and users need to choose the right one for the interconnect they want to use
- generally not available in Linux distributions
- not integrated with schedulers (integrated with SLURM only after version 18)
- FOSS (BSD license)
### Recommendation
- default: use OpenMPI on both clusters
- if unsatisfied with performance and running on single node or over TCP, try MPICH
- if unsatisfied with performance and running on IB try MVAPICH
For a comparison, see for example:
-
-