Table of Contents

Partitions and Limits

The cluster is divided into several partitions. Partitions divide up the resources in the cluster based on either physical attributes of the machines or based on the job types that are allowed to run on certain resources.

Cluster layout

The node types of the cluster are described in Cluster description.

Partitions

Partitions have been made in the scheduling system for the following reasons:

Only the first partition types can be selected by the user. The sub-partitions based on the job length are assigned automatically.

The partitions that have been made are described in the following table.

Partition nameDescriptionTime limitsRemarks
regular Standard 128 core, 512 GB memory nodes 10 days
parallel Standard 128 core, 512 GB memory nodes, with fast Omni-Path network connection 5 days
gpu GPU nodes 3 days See this page for more information
himem Big memory nodes with 4 TB of memory and 80 cores 10 days
gelifes Nodes purchased by the GELIFES institute 10 days See this page for more information

Details on time limits

The Hábrók cluster allows for running very long jobs, that may take up to 10 days. Running these long jobs has certain disadvantages however.
These are:

We therefore urge you to make use of any save and restart options your program has if you need to run this long.
If these options are not in the program these should be added.
We can also help you in optimizing your code so that it can run faster. Please contact us if you need help.

In order to alleviate the scheduling problems we limit long running jobs to only part of the cluster. For example for the standard nodes maximum 80% of the machines may be used for jobs taking more than 3 days. The precise settings depend on the partition, and may be changed to improve job scheduling.

Limits on number of jobs

To prevent the scheduling system from being flooded with jobs, there are some limits in place that define how many jobs you are allowed to have in the cluster at any time.

Note that these limits only count the current number of (waiting and running) jobs. So if you reach the limit, you can submit new jobs after some other ones have finished.
If you try to submit more jobs than allowed, the sbatch command will deny the job submission and show the following error:

sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)