Getting information about jobs, nodes and partitions

There are multiple commands available that will show information about jobs, nodes, partitions or accounting information. All these commands have a lot of different options that can be used. We will give a short overview of some of the useful commands and options. For more information about any of the commands, click on its name to go to the documentation page on the SLURM website. Jobinfo combines these functions and is a lot easier if you just want to have information about a particular job.

The squeue command shows the entire list of jobs, both running and waiting. The

The sstat can show similar information as sacct, but is intended to get statistics about jobs that are still running. Use the —jobs option to specify a job id and the —all-steps option to get information about the entire job:

sstat --jobs=5090 --all-steps
       JobID  MaxVMSize  MaxVMSizeNode  MaxVMSizeTask  AveVMSize     MaxRSS MaxRSSNode MaxRSSTask     AveRSS MaxPages MaxPagesNode   MaxPagesTask   AvePages     MinCPU MinCPUNode MinCPUTask     AveCPU   NTasks AveCPUFreq ReqCPUFreq ConsumedEnergy  MaxDiskRead MaxDiskReadNode MaxDiskReadTask  AveDiskRead MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask AveDiskWrite 
------------ ---------- -------------- -------------- ---------- ---------- ---------- ---------- ---------- -------- ------------ -------------- ---------- ---------- ---------- ---------- ---------- -------- ---------- ---------- -------------- ------------ --------------- --------------- ------------ ------------ ---------------- ---------------- ------------ 
5090.0                0     node007              0          0          0 node007          0          0        0   node007              0          0  00:00.000 node011         96  00:00.000       11          0    Unknown              0            0      node007               0            0            0       node007                0            0 

When the srun command is not used in a script, .batch has to be placed after the JobID in order to display the accounting information for jobs:

sstat --jobs=5090.batch

output of the commands looks like:

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
              2782     short hello_wo      bob PD       0:00      1 (Resources)

You can limit the output to only show your own jobs by using the --user option:

squeue --user=bob
squeue -u bob

The sinfo command can be used to get information about all the available partitions and their properties, including the set of nodes that is attached to a certain partition.

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
short*       up    4:00:00    162  down* node[001-162]

The scontrol command can be used for showing and changing various things. As an example, it can be used to obtain more detailed information about a (running) job:

$ scontrol show job 2782
JobId=2782 JobName=hello_world_job
   UserId=bob(502) GroupId=beheer(500)
   Priority=13019 Nice=0 Account=cit QOS=normal
   JobState=PENDING Reason=Resources Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=00:30:00 TimeMin=N/A
   SubmitTime=2015-03-05T14:34:17 EligibleTime=2015-03-05T14:34:17
   StartTime=2015-03-06T14:34:36 EndTime=Unknown
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=short AllocNode:Sid=login:22147
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=1-1 NumCPUs=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=100M MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/home/bob/hello.sh
   WorkDir=/home/bob
   StdErr=/home/bob/slurm-2782.out
   StdIn=/dev/null
   StdOut=/home/bob/slurm-2782.out

You can also request information about a specific node:

$ scontrol show nodes node155
NodeName=node155 Arch=x86_64 CoresPerSocket=12
   CPUAlloc=4 CPUErr=0 CPUTot=24 CPULoad=0.26 Features=(null)
   Gres=(null)
   NodeAddr=node001 NodeHostName=node001 Version=14.11
   OS=Linux RealMemory=128822 AllocMem=100 Sockets=2 Boards=1
   State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1
   BootTime=2015-03-04T11:38:58 SlurmdStartTime=2015-03-04T11:39:49
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

The sacct command can be used for displaying accounting information for jobs. By default, it will show information about all jobs that have been running today. The --jobs option allows you to show information for a specific job:

$ sacct --jobs=2781
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
2781         hello_wor+      short        cit          4  COMPLETED      0:0 
2781.batch        batch                   cit          1  COMPLETED      0:0

The --long option will print even more available accounting information for a job, which consists of many columns, making the output hard to read. The --format option allows you to pick interesting columns. For instance, if you want to know more about the memory usage of your job, you could use something like:

$ sacct --format JobID,jobname,NTasks,nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize --jobs=2781
       JobID    JobName   NTasks        NodeList     MaxRSS  MaxVMSize     AveRSS  AveVMSize 
------------ ---------- -------- --------------- ---------- ---------- ---------- ---------- 
2781         hello_wor+               node001                                             
2781.batch        batch        1      node001      2040K    213184K      2040K    213184K 

The --format option also allows the special keyword ALL, which will print all available information for this job, consisting of even more columns than the --long options shows:

$ sacct --format ALL --jobs=2781
 AllocCPUS    AllocGRES    Account AssocID     AveCPU AveCPUFreq    AveDiskRead   AveDiskWrite   AvePages     AveRSS  AveVMSize          BlockID    Cluster        Comment ConsumedEnergy ConsumedEnergyRaw    CPUTime CPUTimeRAW DerivedExitCode    Elapsed            Eligible                 End ExitCode    GID     Group        JobID     JobIDRaw    JobName    Layout  MaxDiskRead MaxDiskReadNode MaxDiskReadTask MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask MaxPages MaxPagesNode   MaxPagesTask     MaxRSS MaxRSSNode MaxRSSTask  MaxVMSize  MaxVMSizeNode  MaxVMSizeTask     MinCPU MinCPUNode MinCPUTask      NCPUS   NNodes        NodeList   NTasks   Priority  Partition        QOS QOSRAW ReqCPUFreq  ReqCPUS      ReqGRES     ReqMem          Reservation Reservat   Reserved    ResvCPU ResvCPURAW               Start      State              Submit  Suspended  SystemCPU  Timelimit   TotalCPU    UID      User    UserCPU      WCKey    WCKeyID 
---------- ------------ ---------- ------- ---------- ---------- -------------- -------------- ---------- ---------- ---------- ---------------- ---------- -------------- -------------- ----------------- ---------- ---------- --------------- ---------- ------------------- ------------------- -------- ------ --------- ------------ ------------ ---------- --------- ------------ --------------- --------------- ------------ ---------------- ---------------- -------- ------------ -------------- ---------- ---------- ---------- ---------- -------------- -------------- ---------- ---------- ---------- ---------- -------- --------------- -------- ---------- ---------- ---------- ------ ---------- -------- ------------ ---------- -------------------- -------- ---------- ---------- ---------- ------------------- ---------- ------------------- ---------- ---------- ---------- ---------- ------ --------- ---------- ---------- ---------- 
         4                     cit       5                                                                                                        habrok                                                   00:12:04        724             0:0   00:03:01 2015-02-27T15:52:30 2015-02-27T15:55:32      0:0    500    beheer 2781         2781         hello_wor+                                                                                                                                                                                                                                                                4        1      node001               14638      short     normal      1    Unknown        4                   100Mn                                 00:00:01   00:00:04          4 2015-02-27T15:52:31  COMPLETED 2015-02-27T15:52:30   00:00:00              00:30:00   00:00:00    502       bob                                0 

In order to see all jobs you did run starting from a date, you can use the sacct command and provide a start date with -S. For example:

sacct -S 2015-01-01

The sstat can show similar information as sacct, but is intended to get statistics about jobs that are still running. Use the --jobs option to specify a job id and the --all-steps option to get information about the entire job:

sstat --jobs=5090 --all-steps
       JobID  MaxVMSize  MaxVMSizeNode  MaxVMSizeTask  AveVMSize     MaxRSS MaxRSSNode MaxRSSTask     AveRSS MaxPages MaxPagesNode   MaxPagesTask   AvePages     MinCPU MinCPUNode MinCPUTask     AveCPU   NTasks AveCPUFreq ReqCPUFreq ConsumedEnergy  MaxDiskRead MaxDiskReadNode MaxDiskReadTask  AveDiskRead MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask AveDiskWrite 
------------ ---------- -------------- -------------- ---------- ---------- ---------- ---------- ---------- -------- ------------ -------------- ---------- ---------- ---------- ---------- ---------- -------- ---------- ---------- -------------- ------------ --------------- --------------- ------------ ------------ ---------------- ---------------- ------------ 
5090.0                0     node007              0          0          0 node007          0          0        0   node007              0          0  00:00.000 node011         96  00:00.000       11          0    Unknown              0            0      node007               0            0            0       node007                0            0 

When the srun command is not used in a script, .batch has to be placed after the JobID in order to display the accounting information for jobs:

sstat --jobs=5090.batch