Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
habrok:connecting_to_the_system:login_nodes [2023/02/08 14:36] – [Login node: habrok.hpc.rug.nl] fokkehabrok:connecting_to_the_system:login_nodes [2024/12/19 09:20] (current) admin
Line 7: Line 7:
 ===== Login nodes ===== ===== Login nodes =====
  
-hb-login1.hpc.rug.nl and hb-login2.hpc.rug.nl are the default login nodes that are used by most users. You can use these to connect to the system, copy your files, submit jobs, compile your code, etcetera. You should not use it to test your applications, since this might slow down the system, which will hinder other users who are trying to log in. It is also a smaller system.+''login1.hb.hpc.rug.nl'' and ''login2.hb.hpc.rug.nl'' are the default login nodes that are used by most users. You can use these to connect to the system, copy your files, submit jobs, compile your code, et cetera. You should not use it to test your applications, since this might slow down the system, which will hinder other users who are trying to log in. It is also a smaller system.
  
 We have set up two of these login nodes to increase the availability of the service. We have set up two of these login nodes to increase the availability of the service.
  
-===== Interactive node: hb-interactive.hpc.rug.nl =====+===== Interactive nodes =====
  
-The interactive node is similar to a default compute nodeand it allows for a bit more testingIf you just want to run your program for a couple of minutes, this is the machine to useDo keep in mind that this is also a shared machine and other people may also want to do some testingSo, if you need to do more intensive testing, consider submitting them as jobs.+In Hábrók two interactive nodes have been configuredthese are ''interactive1.hb.hpc.rug.nl'' and ''interactive2.hb.hpc.rug.nl''
  
-===== Interactive GPU node: pg-gpu.hpc.rug.nl =====+The interactive nodes are about half the size of a default compute node, and they allow for a bit more testingIf you just want to run your program for a couple of minutes, these are the machines to useDo keep in mind that these are also a shared machines and other people may also want to do some testing. So, if you need to do longer and/or more intensive tests, these tasks should be submitted as jobs.
  
-Finally, the interactive GPU node is login node equipped with two GPUs. You can use it to develop and test your GPU applications+To prevent single user from using all capacity CPU and memory limits are in place.
  
-This machine has an NVIDIA V100 GPU, which has been subdivided into four smaller GPUs. The tool nvidia-smi will show if/which GPUs are in use, and you can select the one you want to use by setting ''CUDA_VISIBLE_DEVICES'' to a number ranging from 0 to 3, to select one of the GPUs. E.g.:+===== Interactive GPU node =====
  
-<code bash +Finally, the interactive GPU nodes, ''gpu1.hb.hpc.rug.nl'' and ''gpu2.hb.hpc.rug.nl'' are login nodes equipped with a GPU. You can use them to develop and test your GPU applications. 
->export CUDA_VISIBLE_DEVICES=2 +
-</code>+
  
-Please keep in mind that this is also a shared machine, and more users want to use the GPUs in this machine. So, allow everyone to make use of these GPUs and do not perform long runs here. Long runs should be submitted as jobs to scheduler.+These machines have an NVIDIA L40s GPU each, which can be shared by multiple users. The tool ''nvidia-smi'' will show if the GPU is in use. 
 + 
 +Please keep in mind that this is also a shared machine, and more users want to use the GPU in this machine. So, allow everyone to make use of these GPUs and do not perform long runs here. Long runs should be submitted as jobs to scheduler
 + 
 +===== Periodic reboots ===== 
 + 
 +In order to prevent the login/interactive nodes from being filled up with temporary files and long-running processes, these nodes are rebooted every other week on Monday morning at 6:00 CE(S)T. The odd-numbered nodes (''login1'', ''interactive1'', ''gpu1'') are rebooted in odd weeks, and the even-numbered nodes (''login2'', ''interactive2'', ''gpu2'') are rebooted in even weeks.