How to find nodenames allocated by SLURM for a job: Difference between revisions

From RCSWiki
Jump to navigation Jump to search
(Created page with "= Background = When a distributed job that is planned to run on multiple nodes requests several compute nodes from SLURM the program that is going to perform the computations needs to know the names of the compute nodes which are allocated to the job by SLURM to use those nodes. If the program is based on the distributed MPI library then the distribution of the computational processed to the nodes is done by the MPI program launcher, <code>mpirun</code> or <code>mpiexe...")
 
Line 6: Line 6:
<code>mpirun</code> or <code>mpiexec</code>. Thus, these launchers need to know the list of the allocated nodes.
<code>mpirun</code> or <code>mpiexec</code>. Thus, these launchers need to know the list of the allocated nodes.


If the computational code (program) was build on ARC and compiled against the OpenMPI library provided on ARC ( <code>openmpi/4.1.1-gnu<code> at the moment of writing),
If the computational code (program) was build on ARC and compiled against the OpenMPI library provided on ARC ( <code>openmpi/4.1.1-gnu</code> at the moment of writing),
then the launcher is aware about ARC's SLURM scheduler and can obtain the list of nodes directly from SLURM automatically.
then the launcher is aware about ARC's SLURM scheduler and can obtain the list of nodes directly from SLURM automatically.

Revision as of 20:30, 14 July 2023

Background

When a distributed job that is planned to run on multiple nodes requests several compute nodes from SLURM the program that is going to perform the computations needs to know the names of the compute nodes which are allocated to the job by SLURM to use those nodes. If the program is based on the distributed MPI library then the distribution of the computational processed to the nodes is done by the MPI program launcher, mpirun or mpiexec. Thus, these launchers need to know the list of the allocated nodes.

If the computational code (program) was build on ARC and compiled against the OpenMPI library provided on ARC ( openmpi/4.1.1-gnu at the moment of writing), then the launcher is aware about ARC's SLURM scheduler and can obtain the list of nodes directly from SLURM automatically.