This is an old revision of the document!


Group specific partitions

Some groups and institutes have bought their own extensions of Peregrine and Hábrók. These nodes have been put into special partitions. They share the same storage and software environment as all the other nodes.

Since these nodes have been paid for by the group or institute, their users are often the only ones who can access them. To separate the usage between the group specific nodes and the rest of the cluster a separate account has been made in the scheduler for each of these groups, the other regular account being ‘users’.

Only users which have an account in this group managed account are able to submit to their partition. These people will also have an account in ‘users’, like any other Hábrók user. Jobs submitted in the special partition will be accounted for in these special accounts, and jobs in the other partitions will be accounted for in the regular “users” account.

This sounds a bit complicated, but the main thing to take home is:

- Submit to the dedicated partition to get to the group specific nodes.
- Usage (and priorities) of these nodes are handled separately from the usage and priorities of the rest of the cluster.

Users with the coordinator role in one of the special accounts can add users to the account using the following command, which should be run on one of the login/interactive nodes of the cluster:

sacctmgr add user <username> account=<account> fairshare=1

Where <username> should be changed into the userid that is to be added to the account <account>. <account> must be changed into the name of the account, e.g. gelifes, digitallab, caos. The fairshare should be by default set to 1.

In order to verify/check if a user has already been added to the account, the column “Account” in the output of the following command should show a row with “users” and one with the special account:

sacctmgr show -s user <username>

Coordinators can be added to the special accounts using:

sacctmgr add coordinator name=<username> account=<account>

In order to modify an existing user the following can be used:

sacctmgr modify user name=<username> account=<account> set fairshare=10

This will modify the fairshare for the user.

A user can be removed from the account using: sacctmgr -i delete user <username> account=<account_name>

The node themselves are 64 core AMD EPYC 7601 nodes, running at 2.2 GHz, with 512GB of memory. These should be suitable for most of the GELIFES workloads. There is also quite a lot (16 TB) of temporary local scratch space per node, available for jobs through the use $TMPDIR in the jobs scripts.

The gelifes partition has two types of limits: one on the number of jobs per user, and one on the number of cores allocated to different job lengths.

Job typeTime limit Maximum number of submitted jobs per userMaximum number of cores
short ≤ 1 day 1000 960
medium >1 day, ≤ 3 days 1500 640
long >3 days, ≤ 10 days2000 320

Note that jobs from all users contribute to the maximum number of cores that can be allocated to these jobs. This prevents the partition from being filled with long jobs, which would lead to higher waiting times for jobs. If this limit is reached, waiting jobs will get a Reason: QOSGrpCpuLimit.

Modules

Since the instruction set of the AMD CPUs in the gelifes partition is compatible with that of the standard Intel based nodes, the software from these nodes is used on the gelifes partition. There is an issue with the intel compiler based software, however. Whereas the software based on the GNU compilers (foss toolchains) works fine, the software based on the intel toolchain does not work. This because the intel compiler introduces a CPU check in the code, which fails for the AMD nodes.
Our advice is therefore to make use of the foss toolchains, as all relevant software should be available in the foss toolchains.

You can request access to the gelifes account, giving access to the GELIFES nodes by contacting Joke Bakker from GELIFES.