Getting information about jobs, nodes and partitions
There are multiple commands available that will show information about jobs, nodes, partitions or accounting information. All these commands have a lot of different options that can be used. We will give a short overview of some of the useful commands and options. For more information about any of the commands, click on its name to go to the documentation page on the SLURM website. Jobinfo combines these functions and is a lot easier if you just want to have information about a particular job.
squeue
The squeue command shows the entire list of jobs, both running and waiting. The
sstat
The sstat can show similar information as sacct, but is intended to get statistics about jobs that are still running. Use the —jobs option to specify a job id and the —all-steps option to get information about the entire job:
sstat --jobs=5090 --all-steps JobID MaxVMSize MaxVMSizeNode MaxVMSizeTask AveVMSize MaxRSS MaxRSSNode MaxRSSTask AveRSS MaxPages MaxPagesNode MaxPagesTask AvePages MinCPU MinCPUNode MinCPUTask AveCPU NTasks AveCPUFreq ReqCPUFreq ConsumedEnergy MaxDiskRead MaxDiskReadNode MaxDiskReadTask AveDiskRead MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask AveDiskWrite ------------ ---------- -------------- -------------- ---------- ---------- ---------- ---------- ---------- -------- ------------ -------------- ---------- ---------- ---------- ---------- ---------- -------- ---------- ---------- -------------- ------------ --------------- --------------- ------------ ------------ ---------------- ---------------- ------------ 5090.0 0 node007 0 0 0 node007 0 0 0 node007 0 0 00:00.000 node011 96 00:00.000 11 0 Unknown 0 0 node007 0 0 0 node007 0 0
When the srun command is not used in a script, .batch has to be placed after the JobID in order to display the accounting information for jobs:
sstat --jobs=5090.batch
output of the commands looks like:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2782 short hello_wo bob PD 0:00 1 (Resources)
You can limit the output to only show your own jobs by using the --user option:
squeue --user=bob squeue -u bob
sinfo
The sinfo command can be used to get information about all the available partitions and their properties, including the set of nodes that is attached to a certain partition.
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST short* up 4:00:00 162 down* node[001-162]
scontrol
The scontrol command can be used for showing and changing various things. As an example, it can be used to obtain more detailed information about a (running) job:
$ scontrol show job 2782 JobId=2782 JobName=hello_world_job UserId=bob(502) GroupId=beheer(500) Priority=13019 Nice=0 Account=cit QOS=normal JobState=PENDING Reason=Resources Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=00:30:00 TimeMin=N/A SubmitTime=2015-03-05T14:34:17 EligibleTime=2015-03-05T14:34:17 StartTime=2015-03-06T14:34:36 EndTime=Unknown PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=short AllocNode:Sid=login:22147 ReqNodeList=(null) ExcNodeList=(null) NodeList=(null) NumNodes=1-1 NumCPUs=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:* Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=100M MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=OK Contiguous=0 Licenses=(null) Network=(null) Command=/home/bob/hello.sh WorkDir=/home/bob StdErr=/home/bob/slurm-2782.out StdIn=/dev/null StdOut=/home/bob/slurm-2782.out
You can also request information about a specific node:
$ scontrol show nodes node155 NodeName=node155 Arch=x86_64 CoresPerSocket=12 CPUAlloc=4 CPUErr=0 CPUTot=24 CPULoad=0.26 Features=(null) Gres=(null) NodeAddr=node001 NodeHostName=node001 Version=14.11 OS=Linux RealMemory=128822 AllocMem=100 Sockets=2 Boards=1 State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 BootTime=2015-03-04T11:38:58 SlurmdStartTime=2015-03-04T11:39:49 CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
sacct
The sacct command can be used for displaying accounting information for jobs. By default, it will show information about all jobs that have been running today. The --jobs option allows you to show information for a specific job:
$ sacct --jobs=2781 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 2781 hello_wor+ short cit 4 COMPLETED 0:0 2781.batch batch cit 1 COMPLETED 0:0
The --long option will print even more available accounting information for a job, which consists of many columns, making the output hard to read. The --format option allows you to pick interesting columns. For instance, if you want to know more about the memory usage of your job, you could use something like:
$ sacct --format JobID,jobname,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize --jobs=2781 JobID JobName NTasks NodeList MaxRSS MaxVMSize AveRSS AveVMSize ------------ ---------- -------- --------------- ---------- ---------- ---------- ---------- 2781 hello_wor+ node001 2781.batch batch 1 node001 2040K 213184K 2040K 213184K
The --format option also allows the special keyword ALL, which will print all available information for this job, consisting of even more columns than the --long options shows:
$ sacct --format ALL --jobs=2781 AllocCPUS AllocGRES Account AssocID AveCPU AveCPUFreq AveDiskRead AveDiskWrite AvePages AveRSS AveVMSize BlockID Cluster Comment ConsumedEnergy ConsumedEnergyRaw CPUTime CPUTimeRAW DerivedExitCode Elapsed Eligible End ExitCode GID Group JobID JobIDRaw JobName Layout MaxDiskRead MaxDiskReadNode MaxDiskReadTask MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask MaxPages MaxPagesNode MaxPagesTask MaxRSS MaxRSSNode MaxRSSTask MaxVMSize MaxVMSizeNode MaxVMSizeTask MinCPU MinCPUNode MinCPUTask NCPUS NNodes NodeList NTasks Priority Partition QOS QOSRAW ReqCPUFreq ReqCPUS ReqGRES ReqMem Reservation Reservat Reserved ResvCPU ResvCPURAW Start State Submit Suspended SystemCPU Timelimit TotalCPU UID User UserCPU WCKey WCKeycit 5 habrok 00:12:04 724 0:0 00:03:01 2015-02-27T15:52:30 2015-02-27T15:55:32 0:0 500 beheer 2781 2781 hello_wor+ 4 1 node001 14638 short normal 1 Unknown 4 100Mn 00:00:01 00:00:04 4 2015-02-27T15:52:31 COMPLETED 2015-02-27T15:52:30 00:00:00 00:30:00 00:00:00 502 bob 0
In order to see all jobs you did run starting from a date, you can use the sacct command and provide a start date with -S. For example:
sacct -S 2015-01-01
sstat
The sstat can show similar information as sacct, but is intended to get statistics about jobs that are still running. Use the --jobs option to specify a job id and the --all-steps option to get information about the entire job:
sstat --jobs=5090 --all-steps JobID MaxVMSize MaxVMSizeNode MaxVMSizeTask AveVMSize MaxRSS MaxRSSNode MaxRSSTask AveRSS MaxPages MaxPagesNode MaxPagesTask AvePages MinCPU MinCPUNode MinCPUTask AveCPU NTasks AveCPUFreq ReqCPUFreq ConsumedEnergy MaxDiskRead MaxDiskReadNode MaxDiskReadTask AveDiskRead MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask AveDiskWrite ------------ ---------- -------------- -------------- ---------- ---------- ---------- ---------- ---------- -------- ------------ -------------- ---------- ---------- ---------- ---------- ---------- -------- ---------- ---------- -------------- ------------ --------------- --------------- ------------ ------------ ---------------- ---------------- ------------ 5090.0 0 node007 0 0 0 node007 0 0 0 node007 0 0 00:00.000 node011 96 00:00.000 11 0 Unknown 0 0 node007 0 0 0 node007 0 0
When the srun command is not used in a script, .batch has to be placed after the JobID in order to display the accounting information for jobs:
sstat --jobs=5090.batch