Cluster description

The Hábrók cluster has been delivered by Dell and consists of the following parts:

Standard Hábrók compute node

  • 119 standard nodes with the following components:
    • 128 cores @ 2.45 GHz (two AMD 7763 CPUs)
    • 512 GB memory
    • 3.5 TB internal SSD disk space
  • 24 nodes for multi-node jobs with the following components:
    • 128 cores @ 2.45 GHz (two AMD 7763 CPUs)
    • 512 GB memory
    • 3.5 TB internal SSD disk space
    • 100 Gbps Omni-Path link

Rack of Hábrók nodes

  • 4 big memory nodes with the following components:
    • 80 cores @ 2.3 GHz (two Intel Xeon Platinum 8380 CPUs)
    • 4096 GB memory
    • 14 TB internal SSD disk space
  • 6 GPU nodes with the following components:
    • 64 cores @ 2.6 GHz (two Intel Xeon Platinum 8358 CPUs)
    • 512 GB memory
    • 4 Nvidia A100 GPU accelerator cards
    • 12 TB internal SSD NVMe disk space
    • 100 Gbps Omni-Path link

Hábrók nodes with cables for power and network

  • 36 GPU nodes with the following components:
    • 8 cores @ 2.7 GHz (Intel Xeon Gold 6150 cores)
    • 128 GB memory
    • 1 Nvidia V100 GPU accelerator card
  • 15 nodes with the following components:
    • 64 cores @ 2.2 GHz (two AMD EPYC 7601 CPUs)
    • 512 GB memory
    • 16 TB internal disk space
    • Only accessible by GELIFES users, see GELIFES Partition

Hábrók network switches

  • A 100 Gbps low-latency non-blocking Omni-Path network for 24 compute and 6 GPU nodes
    • High bandwidth (100 Gigabit per second)
    • Low latency (few microseconds delay before a client starts receiving the message)
    • Useful for parallel processing over multiple computers
  • Two 25 Gbps Ethernet networks
    • Used for accessing the storage areas and for job communication
    • Can also be useful to access remote data more quickly

Standard Hábrók storage rack

  • The cluster has 2.5 PB (2562 TB) of formatted storage available. This scratch storage is set up using the Lustre parallel file system.
  • 50 GB of home directory storage per user

See Storage areas for more information.

Our standard systems have two sockets with a AMD 7763 processor. Each processor has 64 CPU cores running at 2.45GHz. When not all cores of a processor are used the cores can be run at a higher clockspeed (at most 3.5 GHz).

In principle each CPU core of a modern processor can run multiple threads (programs) simultaneously. This is called hyperthreading. This feature has been disabled on most nodes for the Hábrók cluster as the performance benefits are minimal and it introduces additional complexity to the scheduling system and for the user.

Each processor has its own memory controllers and is connected to its own set of memory. For our standard systems this means that each processor has direct access to 256 GB of the 512 GB in the system.

When a processor wants to access the memory of another processor it has to use the infinity fabric connections between the processors. This connection is much slower than the connection to the local memory. This means that it is important that processes running on one of the processors use the memory local to that processor!

NOTE You can still request all the memory on a machine, even with a single core. This is extremely inefficient however as most cores are idling and you should really look into parallelizing your workload.