GELIFES

The GELIFES institute has bought its own extension of the Peregrine cluster. The nodes have been put in a separate partition, called ‘gelifes’. They share the same storage and software environment as all the other nodes.

Since these nodes have been paid by GELIFES, users from GELIFES are the only ones who can access them. To separate the usage between the GELIFES nodes and the rest of the cluster a separate ‘gelifes’ account has been made in the scheduler, the other account being ‘users’.

Only users which have an account in ‘gelifes’ are able to submit to the gelifes partition. These people will also have an account in ‘users’, like any other Peregrine user. Jobs submitted in the gelifes partition will be accounted for in the gelifes account, and jobs in the other partitions will be accounted for in the users account.

This sounds a bit complicated, but the main thing to take home is:

- Submit to the partition ‘gelifes’ to get to the GELIFES nodes.
- Usage (and priorities) of the GELIFES nodes are handled separately from the usage and priorities of the rest of the cluster.

The node themselves are 64 core AMD EPYC 7601 nodes, running at 2.2 GHz, with 512GB of memory. These should be suitable for most of the GELIFES workloads. There is also quite a lot (16 TB) of temporary local scratch space per node, available for jobs through the use $TMPDIR in the jobs scripts.

The gelifes partition has two types of limits: one on the number of jobs per user, and one on the number of cores allocated to different job lengths.

Job typeTime limit Maximum number of submitted jobs per userMaximum number of cores
short ≤ 1 day 1000 960
medium >1 day, ≤ 3 days 1500 640
long >3 days, ≤ 10 days2000 320

Note that jobs from all users contribute to the maximum number of cores that can be allocated to these jobs. This prevents the partition from being filled with long jobs, which would lead to higher waiting times for jobs. If this limit is reached, waiting jobs will get a Reason: QOSGrpCpuLimit.

Since the instruction set of the AMD CPUs in the gelifes partition is compatible with that of the standard Intel based nodes, the software from these nodes is used on the gelifes partition. There is an issue with the intel compiler based software, however. Whereas the software based on the GNU compilers (foss toolchains) works fine, the software based on the intel toolchain does not work. This because the intel compiler introduces a CPU check in the code, which fails for the AMD nodes.
Our advice is therefore to make use of the foss toolchains, as all relevant software should be available in the foss toolchains.

If you want to make use of the Intel compiler with full optimizations for AMD CPUs, you should make sure that you compile on the gelifes nodes directly. Since there is also a bug in the intel/2018a compiler the intel/2018b compiler has to be used.

You can request access to the gelifes account, giving access to the GELIFES nodes by contacting Joke Bakker from GELIFES.

Users with the coordinator role in the gelifes account can add users to the account using the following command:

sacctmgr add user USERNAME account=gelifes fairshare=1

Where USERNAME should be changed into the userid that is to be added to the account gelifes. The fairshare should be by default set to 1.

In order to verify/check if a user has already been added to the gelifes account, the column “Account” in the output of the following command should show a row with “users” and one with “gelifes”:

sacctmgr show -s user USERNAME

Coordinators can be added to the gelifes account using:

sacctmgr add coordinator name=USERNAME account=gelifes

In order to modify an existing user the following can be used:

sacctmgr modify user name=USERNAME account=gelifes set fairshare=1