Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| habrok:advanced_job_management:running_jobs_on_gpus [2024/01/15 10:50] – [Running jobs on GPUs] camarocico | habrok:advanced_job_management:running_jobs_on_gpus [2026/02/27 10:20] (current) – [Example] fokke | ||
|---|---|---|---|
| Line 18: | Line 18: | ||
| ==== Available GPU types ==== | ==== Available GPU types ==== | ||
| - | ^ Node ^ GPU type ^ GPUs per node ^ Memory per GPU ^ CPUs per node ^ Memory per node ^ Slurm name ^ Notes ^ | + | ^ Node |
| - | | A100_1 | + | | A100 | Nvidia A100 |
| - | | A100_2 | + | | V100 | Nvidia |
| - | | V100 | Nvidia | + | | RTX Pro 6000 | Nvidia |
| + | | L40S | Nvidia L40S | | ||
| ==== Example ==== | ==== Example ==== | ||
| Line 29: | Line 30: | ||
| < | < | ||
| #SBATCH --gpus-per-node=a100: | #SBATCH --gpus-per-node=a100: | ||
| - | </ | ||
| - | If you want to request a node with half of an NVIDIA A100, use the following: | ||
| - | |||
| - | < | ||
| - | #SBATCH --gpus-per-node=a100.20gb: | ||
| </ | </ | ||
| Line 41: | Line 37: | ||
| #SBATCH --gpus-per-node=1 | #SBATCH --gpus-per-node=1 | ||
| </ | </ | ||
| + | |||
| + | Note that this will only sent jobs to the V100 and A100 nodes. This because not all software is compatible with the RTX Pro 6000 GPUs. Furthermore the more capable RTX Pro 6000 nodes should not be swamped with jobs that don't need its capabilities. See [[rtx_pro_6000_gpu_nodes]] for more details. | ||
| ==== Interactive GPU node ==== | ==== Interactive GPU node ==== | ||
| Line 47: | Line 45: | ||
| < | < | ||
| - | gpu1.hpc.rug.nl | + | gpu1.hb.hpc.rug.nl |
| - | gpu2.hpc.rug.nl | + | gpu2.hb.hpc.rug.nl |
| </ | </ | ||
| - | These machines have an NVIDIA | + | These machines have an NVIDIA |
| ** Please keep in mind that this is a shared machine, so allow everyone to make use of these GPUs and do not perform long runs here. Long runs should be submitted as jobs to scheduler. ** | ** Please keep in mind that this is a shared machine, so allow everyone to make use of these GPUs and do not perform long runs here. Long runs should be submitted as jobs to scheduler. ** | ||
| ==== Running interactive jobs ==== | ==== Running interactive jobs ==== | ||
| - | You can request an interactive session by using a command like: | + | You can usually |
| < | < | ||
| srun --gpus-per-node=1 --time=01: | srun --gpus-per-node=1 --time=01: | ||
| </ | </ | ||
| + | |||
| + | There is currently an issue with using '' | ||
| + | < | ||
| + | srun --gres=gpu: | ||
| + | </ | ||
| + | |||
| + | or: | ||
| + | < | ||
| + | srun --gres=gpu: | ||
| + | </ | ||
| + | |||
| When the job starts running, you will be automatically logged in to the allocated node, allowing you to run your commands interactively. When you are done, just type '' | When the job starts running, you will be automatically logged in to the allocated node, allowing you to run your commands interactively. When you are done, just type '' | ||
| **N.B.: interactive jobs currently don't (always) use the software stack built for the allocated nodes, you can work around this by first running '' | **N.B.: interactive jobs currently don't (always) use the software stack built for the allocated nodes, you can work around this by first running '' | ||