Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
habrok:advanced_job_management:running_jobs_on_gpus [2024/02/05 10:21] – [Running interactive jobs] pedrohabrok:advanced_job_management:running_jobs_on_gpus [2026/02/27 10:20] (current) – [Example] fokke
Line 18: Line 18:
 ==== Available GPU types ==== ==== Available GPU types ====
  
-^ Node ^ GPU type ^  GPUs per node ^  Memory per GPU ^  CPUs per node ^  Memory per node ^ Slurm name ^ Notes +^ Node         ^ GPU type            ^  GPUs per node ^  Memory per GPU ^  CPUs per node ^  Memory per node ^ Slurm name ^ 
-A100_1 | Nvidia A100  |  4 |  40 GB |  64 |  512 GB | a100 | Full A100 cards +A100         | Nvidia A100         |  4 |  40 GB |   64 |  512 GB | a100 | 
-A100_2 | Nvidia A100  |  |  20 GB |  64 |  512 GB | a100.20gb | Two virtual GPUs per A100 card +V100         | Nvidia V100         |  |  32 GB |   36 |  128 GB | v100 
-V100 | Nvidia V100  |  |  32 GB |  8|  128 GB | v100 | |+RTX Pro 6000 | Nvidia RTX Pro 6000 |  |  96 GB |  128 |  192 GB | rtx_pro_6000 | 
 +| L40S         | Nvidia L40S         |  |  48 GB |   56 |  512 GB l40s |
  
 ==== Example ==== ==== Example ====
Line 29: Line 30:
 <code> <code>
 #SBATCH --gpus-per-node=a100:2 #SBATCH --gpus-per-node=a100:2
-</code> 
-If you want to request a node with half of an NVIDIA A100, use the following: 
- 
-<code> 
-#SBATCH --gpus-per-node=a100.20gb:1 
 </code> </code>
  
Line 41: Line 37:
 #SBATCH --gpus-per-node=1 #SBATCH --gpus-per-node=1
 </code> </code>
 +
 +Note that this will only sent jobs to the V100 and A100 nodes. This because not all software is compatible with the RTX  Pro 6000 GPUs. Furthermore the more capable RTX Pro 6000 nodes should not be swamped with jobs that don't need its capabilities. See [[rtx_pro_6000_gpu_nodes]] for more details.
  
 ==== Interactive GPU node ==== ==== Interactive GPU node ====
Line 47: Line 45:
  
 <code> <code>
-gpu1.hpc.rug.nl +gpu1.hb.hpc.rug.nl 
-gpu2.hpc.rug.nl+gpu2.hb.hpc.rug.nl
 </code> </code>
  
-These machines have an NVIDIA V100 GPU each, which can be shared by multiple users. The tool ''nvidia-smi'' will show if the GPU is in use. +These machines have an NVIDIA L40S GPU each, which can be shared by multiple users. The tool ''nvidia-smi'' will show if the GPU is in use. 
  
 ** Please keep in mind that this is a shared machine, so allow everyone to make use of these GPUs and do not perform long runs here. Long runs should be submitted as jobs to scheduler. ** ** Please keep in mind that this is a shared machine, so allow everyone to make use of these GPUs and do not perform long runs here. Long runs should be submitted as jobs to scheduler. **
Line 61: Line 59:
 </code> </code>
  
-There is currently an issue with using ''srun --gpus-per-node'', but there is a workaround by using '' --gres'' instead:+There is currently an issue with using ''srun %%--%%gpus-per-node'', but there is a workaround by using '' %%--%%gres'' instead:
 <code> <code>
-srun --gres=1 --time=01:00:00 --pty /bin/bash+srun --gres=gpu:1 --time=01:00:00 --pty /bin/bash
 </code> </code>
  
 or: or:
 <code> <code>
-srun --gres=v100:1 --time=01:00:00 --pty /bin/bash+srun --gres=gpu:v100:1 --time=01:00:00 --pty /bin/bash
 </code> </code>