Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
habrok:advanced_job_management:job_arrays [2020/12/22 10:07] – external edit 127.0.0.1habrok:advanced_job_management:job_arrays [2024/03/14 14:32] (current) – Minor formatting pedro
Line 1: Line 1:
 ====== Job arrays ====== ====== Job arrays ======
  
-Job arrays allow you to easily submit a whole bunch of very similar jobs with a single job script. All jobs need to have the same resource requirements. The job array allows you to define some range of numbers; the length of this range determines how many jobs will be submitted. Furthermore, each job gets one of the numbers in this range through an environment variable $SLURM_ARRAY_TASK_ID and you can use it, for instance, to send the right input or parameter to each job.+Job arrays allow you to easily submit a whole bunch of very similar jobs with a single job script. All jobs need to have the same resource requirements. The job array allows you to define some range of numbers; the length of this range determines how many jobs will be submitted. Furthermore, each job gets one of the numbers in this range through an environment variable ''$SLURM_ARRAY_TASK_ID'' and you can use it, for instance, to send the right input or parameter to each job.
  
 In order to create a job array, start by creating the job script that you would need to run just one instance of the job. For instance, consider the following job script for running R: In order to create a job array, start by creating the job script that you would need to run just one instance of the job. For instance, consider the following job script for running R:
Line 21: Line 21:
 </code> </code>
 \\ \\
-Then you can use "sbatch <name of job script>just once, and the job will run 100 times. Each of the 100 jobs will get one core, 1GB of memory and 12 hours of wall clock time. Do note that they will all run the same R script, myscript.r in this case. This is in a lot of cases probably not very useful!+Then you can use ''sbatch <name of job script>'' just once, and the job will run 100 times. Each of the 100 jobs will get one core, 1GB of memory and 12 hours of wall clock time. Do note that they will all run the same R script, myscript.r in this case. This is in a lot of cases probably not very useful!
  
 ===== Using ${SLURM_ARRAY_TASK_ID} ===== ===== Using ${SLURM_ARRAY_TASK_ID} =====
Line 39: Line 39:
 </code> </code>
 \\ \\
-Here the variable ${SLURM_ARRAY_TASK_ID} will be replaced for each job by a value in the given range.+Here the variable ''${SLURM_ARRAY_TASK_ID}'' will be replaced for each job by a value in the given range. 
  
 ===== More complex ranges ===== ===== More complex ranges =====
Line 53: Line 53:
 ===== Even more complex cases ===== ===== Even more complex cases =====
  
-Suppose that you want to use this to pass input parameters to your program. If the input parameter takes a complex range of values, or if you need more than one parameter, the approach described above would probably not work. In this case you could put all your input parameter combinations in a file, where each combination is on a separate line. You can then use the $SLURM_ARRAY_TASK_ID variable to get the n-th line from the file, and pass that your program. For instance:+Suppose that you want to use this to pass input parameters to your program. If the input parameter takes a complex range of values, or if you need more than one parameter, the approach described above would probably not work. In this case you could put all your input parameter combinations in a file, where each combination is on a separate line. You can then use the ''$SLURM_ARRAY_TASK_ID'' variable to get the n-th line from the file, and pass that your program. For instance:
  
 <code> <code>
Line 64: Line 64:
 </code> </code>
  
-Alternatively, you could also declare arrays with the input parameters, and use $SLURM_ARRAY_TASK_ID to fetch the input parameters from those arrays:+Alternatively, you could also declare arrays with the input parameters, and use ''$SLURM_ARRAY_TASK_ID'' to fetch the input parameters from those arrays:
  
 <code> <code>
Line 76: Line 76:
 ===== Output and job information for job arrays ===== ===== Output and job information for job arrays =====
  
-A job array will get just one main job id, just like a regular job. However, the index values of the range will be used as suffix for the job id: <jobid>_1, <jobid>_2, etcetera. Furthermore, each job will produce its own output file with a filename like slurm_<jobid>_<index>.out.+A job array will get just one main job id, just like a regular job. However, the index values of the range will be used as suffix for the job id: <jobid>_1, <jobid>_2, etcetera. Furthermore, each job will produce its own output file with a filename like slurm_<jobid>_<index>.out. It is also possible to to provide a custom name for the slurm output file with ''$SBATCH --ouput=''. In normal circumstances a name such as ''R_job.out'' would be fine, however, with job arrays that would result in every job writing to the same output file, thus overwriting the previous ones. We can get around his by using ''%j'' and ''%a'', which here will be replaced with the job ID and array index, which would look like: ''$SBATCH --ouput=R_job_%j_%a.out''
  
-The same kind of job ids will also be used in the output of SLURM tools like squeue and sacct. The squeue command will usually try to combine the jobs in the array into a single line, e.g.:+The same kind of job ids will also be used in the output of SLURM tools like ''squeue'' and ''sacct''. The ''squeue'' command will usually try to combine the jobs in the array into a single line, e.g.:
  
 <code> <code>
Line 85: Line 85:
 </code> </code>
 \\ \\
-If you want to get each job of the array to appear on a separate line, you can pass the -r or —array option to squeue.+If you want to get each job of the array to appear on a separate line, you can pass the ''-r'' or ''—array'' option to ''squeue''.
  
 ===== Cancelling job arrays ===== ===== Cancelling job arrays =====