Migration to Hábrók

You can find the slides for the presentation on March 27th here .

Because of increased security requirements we will only allow RUG accounts on Hábrók. Since most people are already using their p- or s-number to login this should not cause issues. Existing f-accounts can also still be used, but note that the RUG is working on reducing the number of f-accounts, and external collaborators may need a p-number in the future. Given the lack of provenance for the existing umcg- accounts we will no longer support these. UMCG staff members will need an account based on a p-number to log in.

See the FAQ for information about the transition procedure for the accounts.

For the logins on Hábrók multi factor authentication (MFA) will be used, just like for all other university services. To make life easier for the users input of the token will only be requested once per 8 hours. This when connecting to the same login node from the same computer for the same user.

Because there are quite a number of inactive accounts on Peregrine, we have decided not to automatically migrate the accounts to Hábrók, so your account on Peregrine will not automatically work on Hábrók.

If you want to use the new cluster, you need to request access to it by using the Self-Service Portal IRIS.

Please go to: Research and Innovation Support → Computing and Research Support Facilities → High Performance Computing Cluster → Request Hábrók Account.

The existing groups on Peregrine will be recreated on Hábrók with a hb- prefix. When group members get a new account on Hábrók we will add these accounts to the new group on Peregrine. After three months we will check for groups without active members and remove those groups from Hábrók. See the data migration section for more details.

For Hábrók we will have two login nodes and two interactive nodes. This to increase the availability. When one of the nodes is down you can use another one. One or two interactive GPU nodes are planned for the near future. The host names are:

login1.hb.hpc.rug.nl
login2.hb.hpc.rug.nl
interactive1.hb.hpc.rug.nl
interactive2.hb.hpc.rug.nl

On all these nodes limits are in place for CPU and memory usage. The limits are higher for the interactive nodes.

On Hábrók a clear separation is made between the home directories, long-term, medium-term, and short-term storage.

The home directories are available on all nodes and are meant for storing personal software and settings. Each user has 50 GiB of storage space available. This amount is fixed. We do make backups of the home directories.

The long- and medium-term storage areas are only available on the login nodes. For long-term storage the RDMS system can be used. For medium-term a /projects storage area is mounted on the login nodes. This /projects area will not be available on the compute nodes, as this storage is not optimized for data processing.

Data that needs to be processed or the data resulting from processing can be stored in the /scratch area. Given the fact that the 30 day retention time for /scratch was circumvented by many users we will no longer remove data from /scratch automatically. As a consequence we will now apply smaller limits to /scratch by default. This limit can be increased on request, where you will need to explain how and where you are going to store important data for the long term.

We will not make a backup of /scratch! And in case of file system issues we can decide to wipe and reformat /scratch.

In Hábrók all nodes have been equipped with fast local storage, which is available during the runtime of the jobs. This storage will perform better than any shared storage that we currently have. This storage can be accessed using the environment variable $TMPDIR in your job scripts. Using this storage area is especially important for use cases with many small files, as most shared file systems (at least those within the available budget) are based on spinning disks and centralized file metadata.

On all storage areas quotas are applied to prevent single users from taking up too much space, thereby limiting what is available for others.

For the home directories a fixed quota of 50 GiB is set for each user. For the /projects and /scratch areas the default quotas are 250 GiB per user. For the latter 200,000 files can be stored. This limit is much lower than on Peregrine as /data and /scratch on Peregrine were overloaded by the amount of files.

Please store huge collections of small files in archives (tar, zip) and extract these to the fast local disk in the nodes before processing.

The quotas on /projects are handled by the “data handling” project.

For /scratch the quota can only be increased if you can guarantee that you can safely store important data elsewhere, and the quotas are based upon a fairshare principle. That means that requests must be reasonable compared to the available space, and that quotas will be reduced when other users need additional space, and there is no longer sufficient storage space available.

The data in the home directories and /data of Peregrine was available read-only on the login nodes of Hábrók for a period of three months (until July 1st 2023), and is no longer available.

The data on Peregrine /scratch was not be migrated, since it is temporary space only.

In Hábrók three main hardware classes are available. Here is a short overview:

Type	#	Cores	Memory (GiB)	GPU	Partition	Local storage (TiB)	Notes
Regular	117	128	512	-	regular	3.5
Omnipath	24	128	512	-	parallel	3.5	high-bandwidth low-latency network connection
Memory	4	80	4096	-	himem	14
GPU1	6	64	512	4 x A100 (40 GiB)	gpu	12	Some GPUs have been divided into smaller 20 GiB units.
GPU2	36	12	128	1 x V100 (32 GiB)	gpu	1
Gelifes	15	64	512	-	gelifes	15 (spinning disk based)	Nodes owned by the GELIFES institute.

In Hábrók four main partitions are available: regular, himem, parallel and gpu. The partitions correspond to the hardware classes in the table above. Besides this the gelifes partition is accessible to the members of the GELIFES research institute.

All these partitions are subdivided into a short, medium and long partition, which you as a user don’t have to select as this will be done automatically, based on job length. All nodes in a class are available in the short sub-partition, a large part in the medium sub-partition and a limited fraction in the long sub-partition.

This setup is to prevent long waiting times for shorter jobs and it will make sure that long running jobs are not spread out over all the nodes.

When no partition is specified the job will be sent to regular or himem nodes depending on the CPU and memory requirements for the job.

Here is a short description of the partitions:

Partition	Description
regular	Partition for the standard CPU nodes
himem	Partition for the nodes with a large amount of memory
parallel	Partition for nodes with a fast, low-latency interconnect. This partition is meant for jobs that use multiple nodes and require high bandwidth or low latency.
gpu	Partition with the GPU nodes. More details in the GPU section below.
gelifes	Partition with the nodes owned by the GELIFES institute.

Coming soon.

Please see Known issues

Because there are quite a number of inactive accounts on Peregrine, we have decided not to automatically migrate the accounts to Hábrók, so your account on Peregrine will not automatically work on Hábrók.

If you want to use the new cluster, you need to request access to it by using the Self-Service Portal IRIS.

Please go to Research and Innovation Support → Computing and Research Support Facilities → High Performance Computing Cluster → Request Hábrók Account.

We will not automatically move your data from Peregrine to Hábrók. The filesystems on Peregrine, /home and /data will be made available read-only on Hábrók for three months after Peregrine shuts down. You will have this time to move your data to permanent storage on Habrok.

The data on Peregrine /scratch will not be migrated, since it is temporary space only.

The best tool for copying data from one location to the other is rsync. Here is an example showing how to synchronize a directory with files from the Peregrine /data, available under /mnt/pg-data on the login nodes to the new /projects on Hábrók:

rsync -av /mnt/pg-data/p123456/important_data/ /projects/p123456/important_data/

Note the slashes at the end of the source and the destination. The following flags have been used:

-a: archive to copy everything recursively including file ownership and permissions
-v: verbose to show the progress

You can also enable compression using -z, but this will only speed up the transfer of highly compressible data. Since sufficient bandwidth should be available for the transfers compression will probably only add overhead.

The best thing about using rsync is that you can restart the transfer in case of failures, and rsync will just continue where it stopped.

The group folders will also be available at /mnt/pg_data/pg-group, and we are currently creating new groups on Habrok. These groups will have a similar naming pattern as on Peregrine, e.g. pg-group becomes hb-group. We will then add the users to these new groups, and the new group will be the owner of the folder /mnt/pg_data/pg-group. From this point, the data can be copied over to Habrok using rsync, as explained above. The location for the group folder on Habrok will be /projects/hb-group.

Migration to Hábrók

Accounts and groups

Account types

Multi factor authentication

Account migration

Groups

Login nodes

File systems and data

Storage areas

Quotas

Data migration

Hardware classes

Partitions

GPUs

Known issues

FAQ

Will my Peregrine account work on Hábrók?

Will my data be automatically moved from Peregrine to Hábrók?

How do I migrate data to Habrok?

How do I migrate data from a group folder to Habrok?