Job arrays

Job arrays allow you to easily submit a whole bunch of very similar jobs with a single job script. All jobs need to have the same resource requirements. The job array allows you to define some range of numbers; the length of this range determines how many jobs will be submitted. Furthermore, each job gets one of the numbers in this range through an environment variable $SLURM_ARRAY_TASK_ID and you can use it, for instance, to send the right input or parameter to each job.

In order to create a job array, start by creating the job script that you would need to run just one instance of the job. For instance, consider the following job script for running R:

#!/bin/bash
#SBATCH --job-name=R_job
#SBATCH --time=12:00:00
#SBATCH --ntasks=1
#SBATCH --mem=1gb

module load R/3.4.2-foss-2016a-X11-20160819
Rscript myscript.r

Now suppose you want this to be run 100 times. You can simply add the array definition by adding the following line somewhere at the top of the script:

#SBATCH --array=1-100

Then you can use sbatch <name of job script> just once, and the job will run 100 times. Each of the 100 jobs will get one core, 1GB of memory and 12 hours of wall clock time. Do note that they will all run the same R script, myscript.r in this case. This is in a lot of cases probably not very useful!

So let us take it a step further: suppose we have 100 different R scripts that have to be run, which are named myscript1.r, myscript2.r, …, myscript100.r. Now we can use the aforementioned environment variable to pick the right R script for each job:

#!/bin/bash
#SBATCH --job-name=R_job
#SBATCH --time=12:00:00
#SBATCH --ntasks=1
#SBATCH --mem=1gb
#SBATCH --array=1-100

module load R/3.4.2-foss-2016a-X11-20160819
Rscript myscript${SLURM_ARRAY_TASK_ID}.r

Here the variable ${SLURM_ARRAY_TASK_ID} will be replaced for each job by a value in the given range.

The range does not necessarily have to be an interval of integers. You can define multiple intervals and/or use a step size to define more complex ranges:

--array=1,3-5,8,101-103

# Step size 2
--array=1-99:2

Suppose that you want to use this to pass input parameters to your program. If the input parameter takes a complex range of values, or if you need more than one parameter, the approach described above would probably not work. In this case you could put all your input parameter combinations in a file, where each combination is on a separate line. You can then use the $SLURM_ARRAY_TASK_ID variable to get the n-th line from the file, and pass that your program. For instance:

INPUTFILE=parameters.in

# get n-th line from $INPUTFILE
ARGS=$(sed "${SLURM_ARRAY_TASK_ID}q;d" $INPUTFILE)

myprogram $ARGS

Alternatively, you could also declare arrays with the input parameters, and use $SLURM_ARRAY_TASK_ID to fetch the input parameters from those arrays:

parameter1=(1 2 3)
parameter2=(100 1000 10000)

myprogram ${parameter1[${SLURM_ARRAY_TASK_ID}]} ${parameter2[${SLURM_ARRAY_TASK_ID}]}

Note that with a file your range should start at 1 and go up to the number of lines in the file. In the latter case, with the arrays, your range should start at 0, and go up to the number of elements in the array.

A job array will get just one main job id, just like a regular job. However, the index values of the range will be used as suffix for the job id: <jobid>_1, <jobid>_2, etcetera. Furthermore, each job will produce its own output file with a filename like slurm_<jobid>_<index>.out. It is also possible to to provide a custom name for the slurm output file with $SBATCH –ouput=. In normal circumstances a name such as R_job.out would be fine, however, with job arrays that would result in every job writing to the same output file, thus overwriting the previous ones. We can get around his by using %j and %a, which here will be replaced with the job ID and array index, which would look like: $SBATCH –ouput=R_job_%j_%a.out

The same kind of job ids will also be used in the output of SLURM tools like squeue and sacct. The squeue command will usually try to combine the jobs in the array into a single line, e.g.:

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
     12345_[1-100]     nodes    R_job  p123456 PD       0:00      1 (Resources)

If you want to get each job of the array to appear on a separate line, you can pass the -r or —array option to squeue.

The scancel command can be use to cancel an entire job array:

scancel 12345

If you want to cancel only specific jobs of the array, you can use the index as suffix to the job id:

scancel 12345_12

Using square brackets you can cancel ranges of jobs, where a range can be defined in a similar way as described in the part about creating the job array:

scancel 12345_[1-10,15]

Job arrays

Using ${SLURM_ARRAY_TASK_ID}

More complex ranges

Even more complex cases

Output and job information for job arrays

Cancelling job arrays