How to use array jobs in SLURM

From RCSWiki
Jump to navigation Jump to search

Array jobs

Job arrays in SLURM allow you to run many jobs using one and same job script. The job script you submit with the sbatch command does not accept any parameters. Like this:

$ sbatch array_job.slurm

If you have many data files you want to process in multiple jobs, this approach may be a convenient alternative to generating many separate job scripts for each data file and submitting them one by one.


An array job script example, array_job.slurm:

#!/bin/bash
# ============================================
#SBATCH --job-name=my_array_job
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4gb
#SBATCH --time=0-01:00:00

#SBATCH --array=1-10
# ============================================

./myapplication $SLURM_ARRAY_TASK_ID

This job script will create on auxiliary job in the queue which will generate 10 element jobs of the array from 1 to 10. The job scripts for each element job of the array will be exactly the same. The difference comes from the value of the environmental variable $SLURM_ARRAY_TASK_ID which is set to the number of this element job in the array (index).


That is the first job of the array will execute command line:

 ./myapplication 1

and the 8th element job will run

 ./myapplication 8

and so on. The auxiliary job will stay in the queue until all the elemental jobs are done. The resources requested in the job script are allocated to each elemental job separately, that is, each job gets its own 4 GB of RAM and 1 CPU, as requested in the example.

Mapping task index to a data file

Simple case of numbered data files

If the data files are data-1.dat, ....., data-11.dat, you can use this mapping:

#!/bin/bash
....
# ============================================
DATA="data-${SLURM_ARRAY_TASK_ID}.dat"

./myapplication $DATA

or, if they are data-001.dat, ....., data-111.dat, you can use the printf command for desired formatting:

#!/bin/bash
....
# ============================================
# This command produces number of the same width, padded to 3 characters.  For example, 001.
printf -v NUM %.3d $SLURM_ARRAY_TASK_ID
DATA="data-${NUM}.dat"

./myapplication $DATA

Mapping job index to a line in a file

The command to extract a specific line from a text file, in this case the line number 10:

sed -n 10p filename.txt

With this, we can put data file names into into a text file index.idx, one data file name per line. Then in the job script the data file names can be extracted using the command above from the line specified by the job task index:

#!/bin/bash
....
# ============================================
MAP_FILE="index.idx"
DATA=`sed -n ${SLURM_ARRAY_TASK_ID}p $MAP_FILE`

./myapplication $DATA

Mapping job index to multiple data files in a file

If you have to process several computations from an element job of the job array you are going to run, you job script have to run several over several data files form your data mapping file.

For example, you want to process 1000 jobs using an array of 20 jobs, from 0 to 19, processing 50 data files in each element job. The data mapping file, index.idx, will look like that:

data-0001.dat
data-0002.dat
data-0003.dat
....
data-0998.dat
data-0999.dat
data-1000.dat

Then, the array job script, array.slurm, can be:

#! /bin/bash
# ====================================================
....
#SBATCH --array=0-19

# ====================================================
SIZE=50
INDEX_FILE="index.idx"

IDXBEG=$(( SLURM_ARRAY_TASK_ID * SIZE ))
IDXEND=$(( IDXBEG + SIZE - 1 ))

for IDX in `seq $IDXBEG $IDXEND`; do
        DATA=`sed -n ${IDX}p $INDEX_FILE`

        # add your work here....
        ./myapplication $DATA
done
# ====================================================