How to find nodenames allocated by SLURM for a job

From RCSWiki
Revision as of 20:29, 14 July 2023 by Dmitri (talk | contribs) (Created page with "= Background = When a distributed job that is planned to run on multiple nodes requests several compute nodes from SLURM the program that is going to perform the computations needs to know the names of the compute nodes which are allocated to the job by SLURM to use those nodes. If the program is based on the distributed MPI library then the distribution of the computational processed to the nodes is done by the MPI program launcher, <code>mpirun</code> or <code>mpiexe...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Background

When a distributed job that is planned to run on multiple nodes requests several compute nodes from SLURM the program that is going to perform the computations needs to know the names of the compute nodes which are allocated to the job by SLURM to use those nodes. If the program is based on the distributed MPI library then the distribution of the computational processed to the nodes is done by the MPI program launcher, mpirun or mpiexec. Thus, these launchers need to know the list of the allocated nodes.

If the computational code (program) was build on ARC and compiled against the OpenMPI library provided on ARC ( openmpi/4.1.1-gnu at the moment of writing), then the launcher is aware about ARC's SLURM scheduler and can obtain the list of nodes directly from SLURM automatically.