Job prioritization

The SLURM scheduler uses a priority based scheduling method. For each submitted job a priority is calculated; how this is done can be found below. The waiting job with the highest priority will, in principle, start first, except when a smaller/shorter job can start without delaying a job with a higher priority. In order to get some insights in why your job is waiting and what the position of your job in the queue is, several commands can be used. They can be found on this page as well.

SLURM uses the following formula to calculate a priority for each job:

Job_priority =
    (PriorityWeightAge)       * (age_factor)        +
    (PriorityWeightFairshare) * (fair-share_factor) +
    (PriorityWeightJobSize)   * (job_size_factor)   +
    (PriorityWeightPartition) * (partition_factor)  +
    (PriorityWeightQOS)       * (QOS_factor)        +
    (possibly some more advanced factors that are not relevant for Habrok)

All the factors in these formulas are floating point numbers between 0.0 and 1.0, while the weights are integer values that determine how important these factors should be considered.

On Habrok we use the following weights in our SLURM configuration:

PriorityWeightAge=2500000
PriorityWeightFairshare=10000000
PriorityWeightJobSize=0
PriorityWeightPartition=0
PriorityWeightQOS=1000000

This means that the priority of a job is mainly determined by a fairshare component and a little bit by its age.
The age of a job refers to how long a job has already been waiting in the queue on a time scale from 0 to 100 days: if it was just queued, the age factor will be 0.0. After 50 days of waiting the age factor will be 0.5 and after 100 days or more the job’s age factor will be and stay at the maximum value of 1.0.
Finally, the most important factor is the fairshare factor: it indicates how much a user recently has been using the system compared to the share of the system that was allocated to this user. This usage decays over time and favors the most recent usage statistics: assuming the user would not use the cluster anymore, his or her usage will decay to half of its original value after a configured half-life period of one week.

The sprio command shows the priority per job, including the individual components for the job age and fairshare (both already multiplied by their corresponding weights). This can be useful for comparing jobs to other waiting jobs and finding out why a job is still waiting.

For a user running the sshare command, it will show in more detail the current fairshare number for this user and the two main components that determine this number: the (normalized) share of the system that was assigned to him/her and the effective usage of the system by this user.

The squeue command shows all jobs on the system and sorts them, by default, by status and priority: first the waiting jobs are shown by descending priority, then the running ones. The position in the queue could give some kind of indication about when your job will start. In order to list the actual priorities, one can run squeue with some additional flags:

squeue -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %.18R %p"

This will include all the default columns and, additionally, a column with the actual priority of each job.

Another useful option for squeue is --start: it will show the estimated start time for (some) waiting jobs, in case SLURM can already calculate one. Note that these are very rough estimates, since they depend on several factors.