Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
habrok:advanced_job_management:running_jobs_on_gpus [2023/04/24 12:20] – [Running interactive jobs] admin | habrok:advanced_job_management:running_jobs_on_gpus [2024/11/22 08:10] (current) – [Available GPU types] fokke | ||
---|---|---|---|
Line 12: | Line 12: | ||
</ | </ | ||
- | where '' | + | where '' |
Jobs requesting GPU resources will automatically end up in one of the GPU partitions. | Jobs requesting GPU resources will automatically end up in one of the GPU partitions. | ||
Line 18: | Line 18: | ||
==== Available GPU types ==== | ==== Available GPU types ==== | ||
- | ^ Node ^ GPU type ^ GPUs per node ^ Memory per GPU ^ CPUs per node ^ Memory per node ^ Slurm name ^ Notes ^ | + | ^ Node ^ GPU type ^ GPUs per node ^ Memory per GPU ^ CPUs per node ^ Memory per node ^ Slurm name ^ |
- | | A100_1 | + | | A100 | Nvidia A100 | 4 | 40 GB | 64 | 512 GB | a100 | |
- | | A100_2 | Nvidia A100 | 8 | 20 GB | 64 | 512 GB | a100.20gb | Two virtual GPUs per A100 card | | + | | V100 | Nvidia V100 | 1 or 2 | 32 GB | |
- | | V100 | Nvidia V100 | 1 | 32 GB | | + | |
- | ** Please be aware that the V100 nodes still need to be migrated to Hábrók. ** | ||
==== Example ==== | ==== Example ==== | ||
Line 31: | Line 29: | ||
#SBATCH --gpus-per-node=a100: | #SBATCH --gpus-per-node=a100: | ||
</ | </ | ||
- | If you want to request a node with half of an NVIDIA A100, use the following: | ||
- | < | + | If you just want one GPU, you can leave out the type, in which case the job will get whichever |
- | #SBATCH --gpus-per-node=a100.20gb: | + | |
- | </ | + | |
- | + | ||
- | If you just want one GPU you can leave out the type and use, which will make your job go a the 20 GB A100 virtual | + | |
< | < | ||
Line 45: | Line 38: | ||
==== Interactive GPU node ==== | ==== Interactive GPU node ==== | ||
- | Not yet available. | + | Besides the compute nodes listed above, there are two GPU nodes that can be used to test and develop your software. These machines are similar to the login and interactive nodes, and you can connect to them using the following hostname: |
+ | < | ||
+ | gpu1.hb.hpc.rug.nl | ||
+ | gpu2.hb.hpc.rug.nl | ||
+ | </ | ||
+ | |||
+ | These machines have an NVIDIA L40S GPU each, which can be shared by multiple users. The tool '' | ||
+ | ** Please keep in mind that this is a shared machine, so allow everyone to make use of these GPUs and do not perform long runs here. Long runs should be submitted as jobs to scheduler. ** | ||
==== Running interactive jobs ==== | ==== Running interactive jobs ==== | ||
- | You can request an interactive session by using a command like: | + | You can usually |
< | < | ||
srun --gpus-per-node=1 --time=01: | srun --gpus-per-node=1 --time=01: | ||
</ | </ | ||
+ | |||
+ | There is currently an issue with using '' | ||
+ | < | ||
+ | srun --gres=gpu: | ||
+ | </ | ||
+ | |||
+ | or: | ||
+ | < | ||
+ | srun --gres=gpu: | ||
+ | </ | ||
+ | |||
When the job starts running, you will be automatically logged in to the allocated node, allowing you to run your commands interactively. When you are done, just type '' | When the job starts running, you will be automatically logged in to the allocated node, allowing you to run your commands interactively. When you are done, just type '' | ||
- | **N.B.: interactive jobs currently don't (always) use the software stack built for the allocated nodes, you can work around this by first doing a '' | + | **N.B.: interactive jobs currently don't (always) use the software stack built for the allocated nodes, you can work around this by first running |