Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
habrok:advanced_job_management:running_jobs_on_gpus [2023/09/11 13:01] – [Interactive GPU node] Created the section aurel | habrok:advanced_job_management:running_jobs_on_gpus [2024/11/22 08:10] (current) – [Available GPU types] fokke | ||
---|---|---|---|
Line 12: | Line 12: | ||
</ | </ | ||
- | where '' | + | where '' |
Jobs requesting GPU resources will automatically end up in one of the GPU partitions. | Jobs requesting GPU resources will automatically end up in one of the GPU partitions. | ||
Line 18: | Line 18: | ||
==== Available GPU types ==== | ==== Available GPU types ==== | ||
- | ^ Node ^ GPU type ^ GPUs per node ^ Memory per GPU ^ CPUs per node ^ Memory per node ^ Slurm name ^ Notes ^ | + | ^ Node ^ GPU type ^ GPUs per node ^ Memory per GPU ^ CPUs per node ^ Memory per node ^ Slurm name ^ |
- | | A100_1 | + | | A100 | Nvidia A100 | 4 | 40 GB | 64 | 512 GB | a100 | |
- | | A100_2 | Nvidia A100 | 8 | 20 GB | 64 | 512 GB | a100.20gb | Two virtual GPUs per A100 card | | + | | V100 | Nvidia V100 | 1 or 2 | 32 GB | |
- | | V100 | Nvidia V100 | 1 | 32 GB | | + | |
==== Example ==== | ==== Example ==== | ||
Line 29: | Line 28: | ||
< | < | ||
#SBATCH --gpus-per-node=a100: | #SBATCH --gpus-per-node=a100: | ||
- | </ | ||
- | If you want to request a node with half of an NVIDIA A100, use the following: | ||
- | |||
- | < | ||
- | #SBATCH --gpus-per-node=a100.20gb: | ||
</ | </ | ||
Line 47: | Line 41: | ||
< | < | ||
- | gpu1.hpc.rug.nl | + | gpu1.hb.hpc.rug.nl |
- | gpu2.hpc.rug.nl | + | gpu2.hb.hpc.rug.nl |
</ | </ | ||
- | These machines have an NVIDIA | + | These machines have an NVIDIA |
** Please keep in mind that this is a shared machine, so allow everyone to make use of these GPUs and do not perform long runs here. Long runs should be submitted as jobs to scheduler. ** | ** Please keep in mind that this is a shared machine, so allow everyone to make use of these GPUs and do not perform long runs here. Long runs should be submitted as jobs to scheduler. ** | ||
==== Running interactive jobs ==== | ==== Running interactive jobs ==== | ||
- | You can request an interactive session by using a command like: | + | You can usually |
< | < | ||
srun --gpus-per-node=1 --time=01: | srun --gpus-per-node=1 --time=01: | ||
</ | </ | ||
+ | |||
+ | There is currently an issue with using '' | ||
+ | < | ||
+ | srun --gres=gpu: | ||
+ | </ | ||
+ | |||
+ | or: | ||
+ | < | ||
+ | srun --gres=gpu: | ||
+ | </ | ||
+ | |||
When the job starts running, you will be automatically logged in to the allocated node, allowing you to run your commands interactively. When you are done, just type '' | When the job starts running, you will be automatically logged in to the allocated node, allowing you to run your commands interactively. When you are done, just type '' | ||
**N.B.: interactive jobs currently don't (always) use the software stack built for the allocated nodes, you can work around this by first running '' | **N.B.: interactive jobs currently don't (always) use the software stack built for the allocated nodes, you can work around this by first running '' |