Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
habrok:job_management:scheduling_system [2020/08/26 14:49] – external edit 127.0.0.1habrok:job_management:scheduling_system [2024/11/18 12:18] (current) – Format table pedro
Line 5: Line 5:
  
 For each job a user has to specify what amount of resources are required for the job to run properly. The following resources can be requested: For each job a user has to specify what amount of resources are required for the job to run properly. The following resources can be requested:
-  * A number of nodes/computers 
   * A number CPU cores per node   * A number CPU cores per node
 +  * A number of nodes/computers
   * An amount of memory per core or node   * An amount of memory per core or node
   * The amount of time the job needs   * The amount of time the job needs
Line 20: Line 20:
 ===== SLURM ===== ===== SLURM =====
  
-On the Peregrine cluster the [[https://slurm.schedmd.com/|SLURM]] resource scheduler is used. This means that jobs need to be specified according to SLURM syntax rules and that the SLURM commands have to be used. These commands are fully documented at the [[https://slurm.schedmd.com/man_index.html|SLURM manual pages]]+On the Hábrók cluster the [[https://slurm.schedmd.com/|SLURM]] resource scheduler is used. This means that jobs need to be specified according to SLURM syntax rules and that the SLURM commands have to be used. These commands are fully documented at the [[https://slurm.schedmd.com/man_index.html|SLURM manual pages]]
  
 ===== Job scripts ===== ===== Job scripts =====
Line 60: Line 60:
  
 Detail: Wall clock time and CPU time. Within computer systems a difference is made between wall clock time and CPU time. Wall clock time is the normal time that passes by and can be measured using for example two readings from a wall clock. CPU time is the fraction of this time spent by a CPU on calculations. Times when the CPU is waiting for the operating system or incoming data from the file system is not counted.  When using a single CPU core this amount of time can therefore never be greater than the wall clock time. But, to make things more complex a program can make use of multiple CPU cores. In that case the CPU time is accumulated on all these CPU cores and will therefore normally progress much faster than on a single CPU, and normally increase much faster than the time that has passed on the wall clock in the same period.   Detail: Wall clock time and CPU time. Within computer systems a difference is made between wall clock time and CPU time. Wall clock time is the normal time that passes by and can be measured using for example two readings from a wall clock. CPU time is the fraction of this time spent by a CPU on calculations. Times when the CPU is waiting for the operating system or incoming data from the file system is not counted.  When using a single CPU core this amount of time can therefore never be greater than the wall clock time. But, to make things more complex a program can make use of multiple CPU cores. In that case the CPU time is accumulated on all these CPU cores and will therefore normally progress much faster than on a single CPU, and normally increase much faster than the time that has passed on the wall clock in the same period.  
 +
 +
 ==== Nodes and cores ==== ==== Nodes and cores ====
  
Line 101: Line 103:
 Note that if you only use ''%%--%%ntasks'' to request N cores, these N cores may be distributed over 1 to N nodes. If your software cannot handle this or if you do not want this, you will have to use the ''%%--%%nodes'' parameter to limit the number of nodes. Note that if you only use ''%%--%%ntasks'' to request N cores, these N cores may be distributed over 1 to N nodes. If your software cannot handle this or if you do not want this, you will have to use the ''%%--%%nodes'' parameter to limit the number of nodes.
  
 +==== Multinode jobs requiring faster networking ====
 +
 +If your application is using MPI and may benefit from a high bandwidth (the amount of data transferred per second) and/or low latency (the amount of time it takes for the first bit to arrive) you can send the job to the ''omnipath'' partition. The nodes in this partition are equipped with an Omni-Path network adapter which has 100 Gbps bandwidth and a latency of a few microseconds. You can do this by specifying the partition in the jobscript using:
 +<code>
 +#SBATCH --partition=parallel
 +</code>
 +Since there are only limited resources available in this partition there are two important guidelines
 +  - When using just a few cores you might as well run your application on a single node
 +  - It would be wise to test the performance difference between a job running on the regular nodes and the omnipath nodes, since there may be more capacity available in the ''regular'' partition.
 ==== Memory ==== ==== Memory ====
  
Line 160: Line 171:
  
 For the first runs you can then use overestimates for the time and memory requirement to make sure your calculations will not be aborted. Once you have gotten feedback from the scheduler about the actual time and memory consumption you can then use more precise amounts. Some hints about reasonable sizes: For the first runs you can then use overestimates for the time and memory requirement to make sure your calculations will not be aborted. Once you have gotten feedback from the scheduler about the actual time and memory consumption you can then use more precise amounts. Some hints about reasonable sizes:
-  * The memory on a standard Peregrine node is at least 4GB per core. So memory requests around 4GB are no problem at all.+  * The memory on a standard Hábrók node is at least 4GB per core. So memory requests around 4GB are no problem at all.
   * For memory requests above 4GB/core you should check the job output for the actual memory usage and adjust the number for consecutive runs. **VERY IMPORTANT Please don't request more than 10GB/core when you are not sure that your program needs it!** You are wasting valuable resources others may need if you do.   * For memory requests above 4GB/core you should check the job output for the actual memory usage and adjust the number for consecutive runs. **VERY IMPORTANT Please don't request more than 10GB/core when you are not sure that your program needs it!** You are wasting valuable resources others may need if you do.
   * **VERY IMPORTANT Never request more than 1 CPU core if you don't know that your program can actually use multiple cores.** Check the program documentation for information on this.    * **VERY IMPORTANT Never request more than 1 CPU core if you don't know that your program can actually use multiple cores.** Check the program documentation for information on this. 
Line 171: Line 182:
 The following table gives an overview and description of other useful parameters that can be used: The following table gives an overview and description of other useful parameters that can be used:
  
-^Parameter ^Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  +^Parameter ^Description ^ 
-|%%--%%job-name |Specify a name for the job, which will be shown in the job overview                                                                                                                                                                                                                                                                                                                                                                                                                                                          +|%%--%%job-name |Specify a name for the job, which will be shown in the job overview |                                                                                                                                                                                                                                                                                                                                                                                                                                                          
-|%%--%%mail-type|Comma-separated list of events for which an email notification should be sent. Valid event names are:\\ //ALL// - equivalent to: //BEGIN//, //END//, //FAIL// and //REQUEUE//\\ //BEGIN// - job started running\\ //END// - job completed\\ //FAIL// - job failed\\ //REQUEUE// - job is requeued\\ //TIME_LIMIT// - job exceeded time limit\\ //TIME_LIMIT_50// - job reached 50 percent of time limit\\ //TIME_LIMIT_80// - job reached 80 percent of time limit\\ //TIME_LIMIT_90// - job reached 90 percent of time limit| +
-|%%--%%mail-user|Email address to receive notifications of job state changes as requested with the —mail-type option\\ **WARNING: due to bans from Microsoft we put a limit on the number of mails that can be sent to Hotmail/Outlook/Live addresses. Therefore, please use your RUG email address here!**                                                                                                                                                                                                                                   +|%%--%%output   |Name of the job output file (default: slurm-<jobid>.out). Use %j if you want to include a job id in the filename.                                                                                                                                                                                                                                                                                                                                                                                               |
-|%%--%%output   |Name of the job output file (default: slurm-<html><jobid></html>.out). Use %j if you want to include a job id in the filename.                                                                                                                                                                                                                                                                                                                                                                                               |+
 |%%--%%partition|Specify in which partition the job has to run                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | |%%--%%partition|Specify in which partition the job has to run                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
  
Line 182: Line 192:
 <code> <code>
 #SBATCH --job-name=my_first_slurm_job #SBATCH --job-name=my_first_slurm_job
-#SBATCH --mail-type=BEGIN,END 
-#SBATCH --mail-user=some@user.com 
 #SBATCH --output=job-%j.log #SBATCH --output=job-%j.log
 #SBATCH --partition=short #SBATCH --partition=short
Line 228: Line 236:
  
 module purge module purge
-module load GROMACS/4.6.7-ictce-7.2.4-mt+module load GROMACS/2021.5-foss-2021b
  
-srun mdrun+srun gmx_mpi <arguments>
 </code> </code>
-This script will ask for 2 nodes and 4 tasks per node. The maximum runtime is 2 days and 12 hours. The amount of memory available for the job is almost 4 GiB per node. Once the job is executed, it will first load the module for Gromacs 4.6. To start a parallel (MPI) run of mdrun, we use srun (instead of mpirun) to start all mdrun processes on the allocated nodes.+This script will ask for 2 nodes and 4 tasks per node. The maximum runtime is 2 days and 12 hours. The amount of memory available for the job is almost 4 GiB per node. Once the job is executed, it will first load the module for GROMACS 2021.To start a parallel (MPI) run, we use srun (instead of mpirun) to start all GROMACS processes on the allocated nodes.