Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
habrok:job_management:scheduling_system [2020/08/26 14:49] – external edit 127.0.0.1 | habrok:job_management:scheduling_system [2024/11/18 12:18] (current) – Format table pedro | ||
---|---|---|---|
Line 5: | Line 5: | ||
For each job a user has to specify what amount of resources are required for the job to run properly. The following resources can be requested: | For each job a user has to specify what amount of resources are required for the job to run properly. The following resources can be requested: | ||
- | * A number of nodes/ | ||
* A number CPU cores per node | * A number CPU cores per node | ||
+ | * A number of nodes/ | ||
* An amount of memory per core or node | * An amount of memory per core or node | ||
* The amount of time the job needs | * The amount of time the job needs | ||
Line 20: | Line 20: | ||
===== SLURM ===== | ===== SLURM ===== | ||
- | On the Peregrine | + | On the Hábrók |
===== Job scripts ===== | ===== Job scripts ===== | ||
Line 60: | Line 60: | ||
Detail: Wall clock time and CPU time. Within computer systems a difference is made between wall clock time and CPU time. Wall clock time is the normal time that passes by and can be measured using for example two readings from a wall clock. CPU time is the fraction of this time spent by a CPU on calculations. Times when the CPU is waiting for the operating system or incoming data from the file system is not counted. | Detail: Wall clock time and CPU time. Within computer systems a difference is made between wall clock time and CPU time. Wall clock time is the normal time that passes by and can be measured using for example two readings from a wall clock. CPU time is the fraction of this time spent by a CPU on calculations. Times when the CPU is waiting for the operating system or incoming data from the file system is not counted. | ||
+ | |||
+ | |||
==== Nodes and cores ==== | ==== Nodes and cores ==== | ||
Line 101: | Line 103: | ||
Note that if you only use '' | Note that if you only use '' | ||
+ | ==== Multinode jobs requiring faster networking ==== | ||
+ | |||
+ | If your application is using MPI and may benefit from a high bandwidth (the amount of data transferred per second) and/or low latency (the amount of time it takes for the first bit to arrive) you can send the job to the '' | ||
+ | < | ||
+ | #SBATCH --partition=parallel | ||
+ | </ | ||
+ | Since there are only limited resources available in this partition there are two important guidelines | ||
+ | - When using just a few cores you might as well run your application on a single node | ||
+ | - It would be wise to test the performance difference between a job running on the regular nodes and the omnipath nodes, since there may be more capacity available in the '' | ||
==== Memory ==== | ==== Memory ==== | ||
Line 160: | Line 171: | ||
For the first runs you can then use overestimates for the time and memory requirement to make sure your calculations will not be aborted. Once you have gotten feedback from the scheduler about the actual time and memory consumption you can then use more precise amounts. Some hints about reasonable sizes: | For the first runs you can then use overestimates for the time and memory requirement to make sure your calculations will not be aborted. Once you have gotten feedback from the scheduler about the actual time and memory consumption you can then use more precise amounts. Some hints about reasonable sizes: | ||
- | * The memory on a standard | + | * The memory on a standard |
* For memory requests above 4GB/core you should check the job output for the actual memory usage and adjust the number for consecutive runs. **VERY IMPORTANT Please don't request more than 10GB/core when you are not sure that your program needs it!** You are wasting valuable resources others may need if you do. | * For memory requests above 4GB/core you should check the job output for the actual memory usage and adjust the number for consecutive runs. **VERY IMPORTANT Please don't request more than 10GB/core when you are not sure that your program needs it!** You are wasting valuable resources others may need if you do. | ||
* **VERY IMPORTANT Never request more than 1 CPU core if you don't know that your program can actually use multiple cores.** Check the program documentation for information on this. | * **VERY IMPORTANT Never request more than 1 CPU core if you don't know that your program can actually use multiple cores.** Check the program documentation for information on this. | ||
Line 171: | Line 182: | ||
The following table gives an overview and description of other useful parameters that can be used: | The following table gives an overview and description of other useful parameters that can be used: | ||
- | ^Parameter ^Description | + | ^Parameter ^Description ^ |
- | |%%--%%job-name |Specify a name for the job, which will be shown in the job overview | + | |%%--%%job-name |Specify a name for the job, which will be shown in the job overview | |
- | |%%--%%mail-type|Comma-separated list of events for which an email notification should be sent. Valid event names are:\\ //ALL// - equivalent to: //BEGIN//, //END//, //FAIL// and // | + | | |
- | |%%--%%mail-user|Email address to receive notifications of job state changes as requested with the —mail-type option\\ **WARNING: due to bans from Microsoft we put a limit on the number of mails that can be sent to Hotmail/ | + | |%%--%%output |
- | |%%--%%output | + | |
|%%--%%partition|Specify in which partition the job has to run | | |%%--%%partition|Specify in which partition the job has to run | | ||
Line 182: | Line 192: | ||
< | < | ||
#SBATCH --job-name=my_first_slurm_job | #SBATCH --job-name=my_first_slurm_job | ||
- | #SBATCH --mail-type=BEGIN, | ||
- | #SBATCH --mail-user=some@user.com | ||
#SBATCH --output=job-%j.log | #SBATCH --output=job-%j.log | ||
#SBATCH --partition=short | #SBATCH --partition=short | ||
Line 228: | Line 236: | ||
module purge | module purge | ||
- | module load GROMACS/4.6.7-ictce-7.2.4-mt | + | module load GROMACS/2021.5-foss-2021b |
- | srun mdrun | + | srun gmx_mpi < |
</ | </ | ||
- | This script will ask for 2 nodes and 4 tasks per node. The maximum runtime is 2 days and 12 hours. The amount of memory available for the job is almost 4 GiB per node. Once the job is executed, it will first load the module for Gromacs 4.6. To start a parallel (MPI) run of mdrun, we use srun (instead of mpirun) to start all mdrun processes on the allocated nodes. | + | This script will ask for 2 nodes and 4 tasks per node. The maximum runtime is 2 days and 12 hours. The amount of memory available for the job is almost 4 GiB per node. Once the job is executed, it will first load the module for GROMACS 2021.5 To start a parallel (MPI) run, we use srun (instead of mpirun) to start all GROMACS |