Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
habrok:job_management:scheduling_system [2020/08/26 14:49] – external edit 127.0.0.1habrok:job_management:scheduling_system [2026/06/29 12:40] (current) – [Resource allocation: Jobs and jobscripts] pedro
Line 5: Line 5:
  
 For each job a user has to specify what amount of resources are required for the job to run properly. The following resources can be requested: For each job a user has to specify what amount of resources are required for the job to run properly. The following resources can be requested:
-  * A number of nodes/computers 
   * A number CPU cores per node   * A number CPU cores per node
 +  * A number of nodes/computers
   * An amount of memory per core or node   * An amount of memory per core or node
   * The amount of time the job needs   * The amount of time the job needs
Line 16: Line 16:
 When resources are available jobs will start immediately. If the requested resources are not available jobs will be put in a queue. The ordering of this queue is based on priority. High resource usage will lower your priority for new jobs. A period of low activity will cause your priority for new jobs to increase again.  When resources are available jobs will start immediately. If the requested resources are not available jobs will be put in a queue. The ordering of this queue is based on priority. High resource usage will lower your priority for new jobs. A period of low activity will cause your priority for new jobs to increase again. 
  
-**IMPORTANT** For those that know most of this, or are to lazy to read the whole page there are some [[#guidelines_for_cpu_cores_time_and_memory|important guidelines that you should know]]. +<wrap important>For those that know most of this, or are to lazy to read the whole page there are some [[#guidelines_for_cpu_cores_time_and_memory|important guidelines that you should know]]. </wrap>
  
 ===== SLURM ===== ===== SLURM =====
  
-On the Peregrine cluster the [[https://slurm.schedmd.com/|SLURM]] resource scheduler is used. This means that jobs need to be specified according to SLURM syntax rules and that the SLURM commands have to be used. These commands are fully documented at the [[https://slurm.schedmd.com/man_index.html|SLURM manual pages]]+On the Hábrók cluster the [[https://slurm.schedmd.com/|SLURM]] resource scheduler is used. This means that jobs need to be specified according to SLURM syntax rules and that the SLURM commands have to be used. These commands are fully documented at the [[https://slurm.schedmd.com/man_index.html|SLURM manual pages]]
  
 ===== Job scripts ===== ===== Job scripts =====
Line 60: Line 60:
  
 Detail: Wall clock time and CPU time. Within computer systems a difference is made between wall clock time and CPU time. Wall clock time is the normal time that passes by and can be measured using for example two readings from a wall clock. CPU time is the fraction of this time spent by a CPU on calculations. Times when the CPU is waiting for the operating system or incoming data from the file system is not counted.  When using a single CPU core this amount of time can therefore never be greater than the wall clock time. But, to make things more complex a program can make use of multiple CPU cores. In that case the CPU time is accumulated on all these CPU cores and will therefore normally progress much faster than on a single CPU, and normally increase much faster than the time that has passed on the wall clock in the same period.   Detail: Wall clock time and CPU time. Within computer systems a difference is made between wall clock time and CPU time. Wall clock time is the normal time that passes by and can be measured using for example two readings from a wall clock. CPU time is the fraction of this time spent by a CPU on calculations. Times when the CPU is waiting for the operating system or incoming data from the file system is not counted.  When using a single CPU core this amount of time can therefore never be greater than the wall clock time. But, to make things more complex a program can make use of multiple CPU cores. In that case the CPU time is accumulated on all these CPU cores and will therefore normally progress much faster than on a single CPU, and normally increase much faster than the time that has passed on the wall clock in the same period.  
 +
 +
 ==== Nodes and cores ==== ==== Nodes and cores ====
  
-The requirements for nodes (full computers) and cores can be given using the parameters ''%%--%%nodes'', ''%%--%%ntasks'', ''--ntasks-per-node'', +The requirements for nodes (full computers) and cores can be given using the parameters ''%%--%%nodes'', ''%%--%%ntasks'', ''%%--%%ntasks-per-node'', 
 ''%%--%%ntasks-per-core'', ''%%--%%cpus-per-task'' and ''%%--%%ntasks-per-core''. Here is a basic description of what they mean: ''%%--%%ntasks-per-core'', ''%%--%%cpus-per-task'' and ''%%--%%ntasks-per-core''. Here is a basic description of what they mean:
  
Line 71: Line 73:
 |''%%--%%cpus-per-task''  |Number of threads per task (for multithreaded applications)                                |1            | |''%%--%%cpus-per-task''  |Number of threads per task (for multithreaded applications)                                |1            |
  
-**IMPORTANT** The numbers that are given here, depend on the capabilities of the program being run. Only for programs that can use multiple CPU cores the number of tasks and/or cpus per task may be set higher than 1. The number of nodes can only be higher than 1 for software that is capable of running on multiple physical computers, using network communication.+<wrap important>The numbers that are given here, depend on the capabilities of the program being run. Only for programs that can use multiple CPU cores the number of tasks and/or cpus per task may be set higher than 1. The number of nodes can only be higher than 1 for software that is capable of running on multiple physical computers, using network communication.</wrap>
  
-**VERY IMPORTANT! If you don't know if your program is capable of running in parallel, do not request multiple cores, or nodes! In most cases this is useless and a waste of resources.**+<WRAP important round center>**VERY IMPORTANT! If you don't know if your program is capable of running in parallel, do not request multiple cores, or nodes! In most cases this is useless and a waste of resources.**</WRAP>
  
 The precise requirements are determined by both the software and its scalability and by the user who has to decide himself how to balance runtime, waiting time in the queue and the number of jobs that he or she wants to run. The precise requirements are determined by both the software and its scalability and by the user who has to decide himself how to balance runtime, waiting time in the queue and the number of jobs that he or she wants to run.
Line 101: Line 103:
 Note that if you only use ''%%--%%ntasks'' to request N cores, these N cores may be distributed over 1 to N nodes. If your software cannot handle this or if you do not want this, you will have to use the ''%%--%%nodes'' parameter to limit the number of nodes. Note that if you only use ''%%--%%ntasks'' to request N cores, these N cores may be distributed over 1 to N nodes. If your software cannot handle this or if you do not want this, you will have to use the ''%%--%%nodes'' parameter to limit the number of nodes.
  
 +==== Multinode jobs requiring faster networking ====
 +
 +If your application is using MPI and may benefit from a high bandwidth (the amount of data transferred per second) and/or low latency (the amount of time it takes for the first bit to arrive) you can send the job to the ''omnipath'' partition. The nodes in this partition are equipped with an Omni-Path network adapter which has 100 Gbps bandwidth and a latency of a few microseconds. You can do this by specifying the partition in the jobscript using:
 +<code>
 +#SBATCH --partition=parallel
 +</code>
 +Since there are only limited resources available in this partition there are two important guidelines
 +  - When using just a few cores you might as well run your application on a single node
 +  - It would be wise to test the performance difference between a job running on the regular nodes and the omnipath nodes, since there may be more capacity available in the ''regular'' partition.
 ==== Memory ==== ==== Memory ====
  
Line 160: Line 171:
  
 For the first runs you can then use overestimates for the time and memory requirement to make sure your calculations will not be aborted. Once you have gotten feedback from the scheduler about the actual time and memory consumption you can then use more precise amounts. Some hints about reasonable sizes: For the first runs you can then use overestimates for the time and memory requirement to make sure your calculations will not be aborted. Once you have gotten feedback from the scheduler about the actual time and memory consumption you can then use more precise amounts. Some hints about reasonable sizes:
-  * The memory on a standard Peregrine node is at least 4GB per core. So memory requests around 4GB are no problem at all.+  * The memory on a standard Hábrók node is at least 4GB per core. So memory requests around 4GB are no problem at all.
   * For memory requests above 4GB/core you should check the job output for the actual memory usage and adjust the number for consecutive runs. **VERY IMPORTANT Please don't request more than 10GB/core when you are not sure that your program needs it!** You are wasting valuable resources others may need if you do.   * For memory requests above 4GB/core you should check the job output for the actual memory usage and adjust the number for consecutive runs. **VERY IMPORTANT Please don't request more than 10GB/core when you are not sure that your program needs it!** You are wasting valuable resources others may need if you do.
   * **VERY IMPORTANT Never request more than 1 CPU core if you don't know that your program can actually use multiple cores.** Check the program documentation for information on this.    * **VERY IMPORTANT Never request more than 1 CPU core if you don't know that your program can actually use multiple cores.** Check the program documentation for information on this. 
Line 171: Line 182:
 The following table gives an overview and description of other useful parameters that can be used: The following table gives an overview and description of other useful parameters that can be used:
  
-^Parameter ^Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  +^Parameter ^Description ^ 
-|%%--%%job-name |Specify a name for the job, which will be shown in the job overview                                                                                                                                                                                                                                                                                                                                                                                                                                                          +|%%--%%job-name |Specify a name for the job, which will be shown in the job overview |                                                                                                                                                                                                                                                                                                                                                                                                                                                          
-|%%--%%mail-type|Comma-separated list of events for which an email notification should be sent. Valid event names are:\\ //ALL// - equivalent to: //BEGIN//, //END//, //FAIL// and //REQUEUE//\\ //BEGIN// - job started running\\ //END// - job completed\\ //FAIL// - job failed\\ //REQUEUE// - job is requeued\\ //TIME_LIMIT// - job exceeded time limit\\ //TIME_LIMIT_50// - job reached 50 percent of time limit\\ //TIME_LIMIT_80// - job reached 80 percent of time limit\\ //TIME_LIMIT_90// - job reached 90 percent of time limit| +
-|%%--%%mail-user|Email address to receive notifications of job state changes as requested with the —mail-type option\\ **WARNING: due to bans from Microsoft we put a limit on the number of mails that can be sent to Hotmail/Outlook/Live addresses. Therefore, please use your RUG email address here!**                                                                                                                                                                                                                                   +|%%--%%output   |Name of the job output file (default: slurm-<jobid>.out). Use %j if you want to include a job id in the filename.                                                                                                                                                                                                                                                                                                                                                                                               |
-|%%--%%output   |Name of the job output file (default: slurm-<html><jobid></html>.out). Use %j if you want to include a job id in the filename.                                                                                                                                                                                                                                                                                                                                                                                               |+
 |%%--%%partition|Specify in which partition the job has to run                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | |%%--%%partition|Specify in which partition the job has to run                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
  
Line 182: Line 192:
 <code> <code>
 #SBATCH --job-name=my_first_slurm_job #SBATCH --job-name=my_first_slurm_job
-#SBATCH --mail-type=BEGIN,END 
-#SBATCH --mail-user=some@user.com 
 #SBATCH --output=job-%j.log #SBATCH --output=job-%j.log
 #SBATCH --partition=short #SBATCH --partition=short
Line 228: Line 236:
  
 module purge module purge
-module load GROMACS/4.6.7-ictce-7.2.4-mt+module load GROMACS/2021.5-foss-2021b
  
-srun mdrun+srun gmx_mpi <arguments>
 </code> </code>
-This script will ask for 2 nodes and 4 tasks per node. The maximum runtime is 2 days and 12 hours. The amount of memory available for the job is almost 4 GiB per node. Once the job is executed, it will first load the module for Gromacs 4.6. To start a parallel (MPI) run of mdrun, we use srun (instead of mpirun) to start all mdrun processes on the allocated nodes.+This script will ask for 2 nodes and 4 tasks per node. The maximum runtime is 2 days and 12 hours. The amount of memory available for the job is almost 4 GiB per node. Once the job is executed, it will first load the module for GROMACS 2021.To start a parallel (MPI) run, we use srun (instead of mpirun) to start all GROMACS processes on the allocated nodes.