MPI

Software with MPI support can run on multiple nodes, and in this case the software is launched with one or more tasks (instances) per node. All these tasks use the network for inter-node communication. The following is an example of a jobscript that can be used for MPI applications:

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --time=01:00:00
#SBATCH --mem-per-cpu=2000

module purge
module load foss/2023a

# compile our source code; not required if this has been done before
mpicc -o ./my_mpi_app my_mpi_app.c
srun ./my_mpi_app

Here we request two nodes with four tasks on each node, i.e. in total we will be running 8 tasks (on 8 CPUs). The important aspect for an MPI application is that we launch it using srun, which is a scheduler commands that ensures that the application is started on all allocated resources and with the right number of tasks. Also MPI's own mpirun can be used, but we generally recommend to use srun.

Currently, two MPI implementations are supported/installed on Hábrók: OpenMPI and Intel MPI.

OpenMPI

OpenMPI is part of the foss toolchains, and this is used for most of the MPI applications that are available on Hábrók. If you're compiling custom software and want to use the GCC compilers, this is the recommended MPI. The jobscript that we showed earlier should work fine for these applications.

Intel MPI

Intel MPI is available as part of the intel toolchains. Intel MPI does not integrate as well with the scheduler as OpenMPI does, which means that we need to provide an additional setting in a jobscript in order to be able to launch our applications with srun:

export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi2.so
srun ./my_mpi_app

Note: mpirun can also be used for launching Intel MPI applications, but after a recent upgrade of the Slurm scheduler it seems that this sometimes leads to connection issues, in particular when using larger number of nodes:

[bstrap:0:-1@node15] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on node16 (pid 859263, exit code 49152)
[bstrap:0:-1@node15] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error
[bstrap:0:-1@node15] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error
[bstrap:0:-1@node15] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:1061): error waiting for event
[bstrap:0:-1@node15] upstream_cb (../../../../../src/pm/i_hydra/libhydra/bstrap/src/hydra_bstrap_proxy.c:356): error setting up the bstrap proxies
[bstrap:0:-1@node15] HYDI_dmx_poll_wait_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:80): callback returned error status
[bstrap:0:-1@node15] main (../../../../../src/pm/i_hydra/libhydra/bstrap/src/hydra_bstrap_proxy.c:628): error sending proxy ID upstream
...
[mpiexec@node1] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:362): write error (Bad file descriptor)

It's unclear what is causing these issues, but using srun instead of mpirun should solve this issue.