Differences

This shows you the differences between two versions of the page.

--- habrok:advanced_job_management:running_jobs_on_gpus [2023/09/11 13:01] – [Interactive GPU node] Created the section aurel
+++ habrok:advanced_job_management:running_jobs_on_gpus [2024/11/22 08:10] (current) – [Available GPU types] fokke
@@ Line 12: / Line 12: @@
 </code>
-where ''type'' is the type of GPU. Note that is is also still possible to use the ''%%--gres%%'' option that was required on Peregrine.
+where ''type'' is the type of GPU. Note that it is also still possible to use the ''%%--gres%%'' option that was required on Peregrine.
 Jobs requesting GPU resources will automatically end up in one of the GPU partitions.
@@ Line 18: / Line 18: @@
 ==== Available GPU types ====
-^ Node ^ GPU type ^  GPUs per node ^  Memory per GPU ^  CPUs per node ^  Memory per node ^ Slurm name ^ Notes ^
+^ Node ^ GPU type ^  GPUs per node ^  Memory per GPU ^  CPUs per node ^  Memory per node ^ Slurm name ^
-| A100_1 | Nvidia A100  |  4 |  40 GB |  64 |  512 GB | a100 | Full A100 cards |
+| A100 | Nvidia A100  |  4 |  40 GB |  64 |  512 GB | a100 |
-| A100_2 | Nvidia A100  |  8 |  20 GB |  64 |  512 GB | a100.20gb | Two virtual GPUs per A100 card |
+| V100 | Nvidia V100  |  1 or 2 |  32 GB |  36 |  128 GB | v100 |
-| V100 | Nvidia V100  |  1 |  32 GB |  8|  128 GB | v100 | |
 ==== Example ====
@@ Line 29: / Line 28: @@
 <code>
 #SBATCH --gpus-per-node=a100:2
-</code>
-If you want to request a node with half of an NVIDIA A100, use the following:
-<code>
-#SBATCH --gpus-per-node=a100.20gb:1
 </code>
@@ Line 47: / Line 41: @@
 <code>
-gpu1.hpc.rug.nl
+gpu1.hb.hpc.rug.nl
-gpu2.hpc.rug.nl
+gpu2.hb.hpc.rug.nl
 </code>
-These machines have an NVIDIA V100 GPU each, which can be shared by multiple users. The tool ''nvidia-smi'' will show if the GPU is in use.
+These machines have an NVIDIA L40S GPU each, which can be shared by multiple users. The tool ''nvidia-smi'' will show if the GPU is in use.
 ** Please keep in mind that this is a shared machine, so allow everyone to make use of these GPUs and do not perform long runs here. Long runs should be submitted as jobs to scheduler. **
 ==== Running interactive jobs ====
-You can request an interactive session by using a command like:
+You can usually request an interactive session by using a command like:
 <code>
 srun --gpus-per-node=1 --time=01:00:00 --pty /bin/bash
 </code>
+There is currently an issue with using ''srun --gpus-per-node'', but there is a workaround by using '' --gres'' instead:
+<code>
+srun --gres=gpu:1 --time=01:00:00 --pty /bin/bash
+</code>
+or:
+<code>
+srun --gres=gpu:v100:1 --time=01:00:00 --pty /bin/bash
+</code>
 When the job starts running, you will be automatically logged in to the allocated node, allowing you to run your commands interactively. When you are done, just type ''%%exit%%'' to close your interactive job and to release the allocated resources.
 **N.B.: interactive jobs currently don't (always) use the software stack built for the allocated nodes, you can work around this by first running ''unset SW_STACK_ARCH && module restore'' after the job has started.**