How to use array jobs in SLURM: Difference between revisions
(Added navbox) |
|||
(21 intermediate revisions by 2 users not shown) | |||
Line 6: | Line 6: | ||
$ sbatch array_job.slurm | $ sbatch array_job.slurm | ||
If you have many data files you want to process in multiple jobs, | |||
this approach may be a convenient alternative to generating many separate job scripts for each data file and submitting them one by one. | |||
Line 21: | Line 22: | ||
#SBATCH --array=1-10 | #SBATCH --array=1-10 | ||
# ============================================ | |||
./myapplication $SLURM_ARRAY_TASK_ID | ./myapplication $SLURM_ARRAY_TASK_ID | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 29: | Line 30: | ||
The difference comes from the value of the environmental variable <code>$SLURM_ARRAY_TASK_ID</code> | The difference comes from the value of the environmental variable <code>$SLURM_ARRAY_TASK_ID</code> | ||
which is set to the number of this element job in the array (index). | which is set to the number of this element job in the array (index). | ||
That is the first job of the array will execute command line: | That is the first job of the array will execute command line: | ||
Line 35: | Line 38: | ||
./myapplication 8 | ./myapplication 8 | ||
and so on. | and so on. | ||
The auxiliary job will stay in the queue until all the elemental jobs are done. | |||
The resources requested in the job script are allocated to each elemental job separately, | |||
that is, each job gets its own 4 GB of RAM and 1 CPU, as requested in the example. | |||
With this approach, it is left to the <code>myapplication</code> to interpret what the input parameter, | |||
in this case <code>1</code> or <code>8</code>, mean, in terms of data processing. | |||
= Mapping task index to a data file = | |||
== Simple case of numbered data files == | |||
If the data files are <code>data-1.dat</code>, ....., <code>data-11.dat</code>, you can use this mapping: | |||
<syntaxhighlight lang=bash> | |||
#!/bin/bash | |||
.... | |||
# ============================================ | |||
DATA="data-${SLURM_ARRAY_TASK_ID}.dat" | |||
./myapplication $DATA | |||
</syntaxhighlight> | |||
or, if they are <code>data-001.dat</code>, ....., <code>data-111.dat</code>, you can use the <code>printf</code> command for desired formatting: | |||
<syntaxhighlight lang=bash> | |||
#!/bin/bash | |||
.... | |||
# ============================================ | |||
# This command produces number of the same width, padded to 3 characters. For example, 001. | |||
printf -v NUM %.3d $SLURM_ARRAY_TASK_ID | |||
DATA="data-${NUM}.dat" | |||
./myapplication $DATA | |||
</syntaxhighlight> | |||
== Mapping job index to a line in a file == | |||
The command to extract a specific line from a text file, in this case the line number 10: | |||
sed -n 10p filename.txt | |||
With this, we can put data file names into into a text file <code>index.idx</code>, '''one data file name per line'''. | |||
Then in the job script the data file names can be extracted using the command above | |||
from the line specified by the job task index: | |||
<syntaxhighlight lang=bash> | |||
#!/bin/bash | |||
.... | |||
# ============================================ | |||
MAP_FILE="index.idx" | |||
DATA=`sed -n ${SLURM_ARRAY_TASK_ID}p $MAP_FILE` | |||
./myapplication $DATA | |||
</syntaxhighlight> | |||
== Mapping job index to multiple data files in a file == | |||
If you have to process several computations from an element job of the job array you are going to run, | |||
you job script have to run several over several data files form your data mapping file. | |||
For example, you want to process 1000 jobs using an array of 20 jobs, from 0 to 19, processing 50 data files in | |||
each element job. | |||
The data mapping file, <code>index.idx</code>, will look like that: | |||
<pre> | |||
data-0001.dat | |||
data-0002.dat | |||
data-0003.dat | |||
.... | |||
data-0998.dat | |||
data-0999.dat | |||
data-1000.dat | |||
</pre> | |||
Then, the array job script, <code>array.slurm</code>, can be: | |||
<syntaxhighlight lang=bash> | |||
#! /bin/bash | |||
# ==================================================== | |||
.... | |||
#SBATCH --array=0-19 | |||
# ==================================================== | |||
SIZE=50 | |||
INDEX_FILE="index.idx" | |||
IDXBEG=$(( SLURM_ARRAY_TASK_ID * SIZE )) | |||
IDXEND=$(( IDXBEG + SIZE - 1 )) | |||
for IDX in `seq $IDXBEG $IDXEND`; do | |||
DATA=`sed -n ${IDX}p $INDEX_FILE` | |||
# add your work here.... | |||
./myapplication $DATA | |||
done | |||
# ==================================================== | |||
</syntaxhighlight> | |||
= More Arrays = | |||
== Array options == | |||
You can indicate a '''stride for the array elements''' to be used in element jobs: | |||
#SBATCH --array=1-100:4 | |||
This option in your job script sets the stride to 4 and will run 25 jobs with <code>SLURM_ARRAY_TASK_ID</code> | |||
having the values of <code>{1, 5, 9, .... 97}</code>. | |||
Another option is to indicate the '''number of element jobs that can run at the same time'''. | |||
By default all the element jobs can be executed at the same time, provided that resources are available. | |||
With this option you can limit the number of jobs to, for example, one: | |||
#SBATCH --array=1-100%1 | |||
you can combine it with the stride as well, to limit your 25 jobs to run 5 at a time: | |||
#SBATCH --array=1-100:4%5 | |||
Limiting a job array to run 1 job at a time may be a practical way to '''resubmit the same job''' | |||
a fixed number of times. | |||
If you want to run some simulation for 1 000 000 steps, it can be done as an array job of, say 100 element, | |||
each element running 10 000 steps, one after another. | |||
== Array Environmental variables == | |||
For a first element jobs in an <code>--array=1-100</code> | |||
the following variables will be set to | |||
<pre> | |||
SLURM_JOB_ID=87654321 | |||
SLURM_ARRAY_JOB_ID=87654321 | |||
SLURM_ARRAY_TASK_ID=1 | |||
SLURM_ARRAY_TASK_COUNT=100 | |||
SLURM_ARRAY_TASK_MAX=100 | |||
SLURM_ARRAY_TASK_MIN=1 | |||
</pre> | |||
[[Category:Guides]] | |||
[[Category:How-Tos]] | |||
{{Navbox Guides}} |
Latest revision as of 20:36, 21 September 2023
Array jobs
Job arrays in SLURM allow you to run many jobs using one and same job script.
The job script you submit with the sbatch
command does not accept any parameters.
Like this:
$ sbatch array_job.slurm
If you have many data files you want to process in multiple jobs, this approach may be a convenient alternative to generating many separate job scripts for each data file and submitting them one by one.
An array job script example, array_job.slurm
:
#!/bin/bash
# ============================================
#SBATCH --job-name=my_array_job
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4gb
#SBATCH --time=0-01:00:00
#SBATCH --array=1-10
# ============================================
./myapplication $SLURM_ARRAY_TASK_ID
This job script will create on auxiliary job in the queue which will generate 10 element jobs of the array from 1 to 10.
The job scripts for each element job of the array will be exactly the same.
The difference comes from the value of the environmental variable $SLURM_ARRAY_TASK_ID
which is set to the number of this element job in the array (index).
That is the first job of the array will execute command line:
./myapplication 1
and the 8th element job will run
./myapplication 8
and so on. The auxiliary job will stay in the queue until all the elemental jobs are done. The resources requested in the job script are allocated to each elemental job separately, that is, each job gets its own 4 GB of RAM and 1 CPU, as requested in the example.
With this approach, it is left to the myapplication
to interpret what the input parameter,
in this case 1
or 8
, mean, in terms of data processing.
Mapping task index to a data file
Simple case of numbered data files
If the data files are data-1.dat
, ....., data-11.dat
, you can use this mapping:
#!/bin/bash
....
# ============================================
DATA="data-${SLURM_ARRAY_TASK_ID}.dat"
./myapplication $DATA
or, if they are data-001.dat
, ....., data-111.dat
, you can use the printf
command for desired formatting:
#!/bin/bash
....
# ============================================
# This command produces number of the same width, padded to 3 characters. For example, 001.
printf -v NUM %.3d $SLURM_ARRAY_TASK_ID
DATA="data-${NUM}.dat"
./myapplication $DATA
Mapping job index to a line in a file
The command to extract a specific line from a text file, in this case the line number 10:
sed -n 10p filename.txt
With this, we can put data file names into into a text file index.idx
, one data file name per line.
Then in the job script the data file names can be extracted using the command above
from the line specified by the job task index:
#!/bin/bash
....
# ============================================
MAP_FILE="index.idx"
DATA=`sed -n ${SLURM_ARRAY_TASK_ID}p $MAP_FILE`
./myapplication $DATA
Mapping job index to multiple data files in a file
If you have to process several computations from an element job of the job array you are going to run, you job script have to run several over several data files form your data mapping file.
For example, you want to process 1000 jobs using an array of 20 jobs, from 0 to 19, processing 50 data files in
each element job.
The data mapping file, index.idx
, will look like that:
data-0001.dat data-0002.dat data-0003.dat .... data-0998.dat data-0999.dat data-1000.dat
Then, the array job script, array.slurm
, can be:
#! /bin/bash
# ====================================================
....
#SBATCH --array=0-19
# ====================================================
SIZE=50
INDEX_FILE="index.idx"
IDXBEG=$(( SLURM_ARRAY_TASK_ID * SIZE ))
IDXEND=$(( IDXBEG + SIZE - 1 ))
for IDX in `seq $IDXBEG $IDXEND`; do
DATA=`sed -n ${IDX}p $INDEX_FILE`
# add your work here....
./myapplication $DATA
done
# ====================================================
More Arrays
Array options
You can indicate a stride for the array elements to be used in element jobs:
#SBATCH --array=1-100:4
This option in your job script sets the stride to 4 and will run 25 jobs with SLURM_ARRAY_TASK_ID
having the values of {1, 5, 9, .... 97}
.
Another option is to indicate the number of element jobs that can run at the same time. By default all the element jobs can be executed at the same time, provided that resources are available. With this option you can limit the number of jobs to, for example, one:
#SBATCH --array=1-100%1
you can combine it with the stride as well, to limit your 25 jobs to run 5 at a time:
#SBATCH --array=1-100:4%5
Limiting a job array to run 1 job at a time may be a practical way to resubmit the same job a fixed number of times. If you want to run some simulation for 1 000 000 steps, it can be done as an array job of, say 100 element, each element running 10 000 steps, one after another.
Array Environmental variables
For a first element jobs in an --array=1-100
the following variables will be set to
SLURM_JOB_ID=87654321 SLURM_ARRAY_JOB_ID=87654321 SLURM_ARRAY_TASK_ID=1 SLURM_ARRAY_TASK_COUNT=100 SLURM_ARRAY_TASK_MAX=100 SLURM_ARRAY_TASK_MIN=1