Checking job performance
Sometimes it is useful to take a closer look on the performance of jobs using the toolbox that Linux has. We will not describe the details of these tools but just mention a few.
Logging in to compute nodes
Using the ssh
command line tool it is possible to login into nodes where one of your jobs is running. Please note that you can only login into these nodes. A connection to another node will be refused.
Logging in can be done like this:
ssh node34
After this you should obtain a command-line prompt on this node (if you have a job running on that node!). You can use squeue
described earlier to see where your job is running.
Checking job performance
There are two tools that we will describe on this page. The first is top
, the second ps
.
top
The top
tool will show an overview of running processes on the system. You can limit it to your own processes using the -u
option.
The tool will show the CPU utilization at %CPU
, the memory usage using VIRT
for virtual (claimed) memory and RES
for memory that is really used.
The CPU usage should ideally be close to 100% for single core tasks and close to n*100% for multithreaded tasks where n is the number of threads.
To see the individual threads of a multithreaded program you can press shift-H for uppercase H. The q key will exit top.
ps
Another useful tool is ps
. ps
will show processes on the node. To see an extensive overview of the users processes you can use:
ps -elf | egrep "UID|$USER"
Please see the man page of ps
for information about the meaning of the fields.