Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
habrok:introduction:cluster_description [2022/12/14 14:30] – fokke | habrok:introduction:cluster_description [2025/03/05 15:42] (current) – [Compute nodes] Add Digital Lab link pedro | ||
---|---|---|---|
Line 6: | Line 6: | ||
===== Compute nodes ===== | ===== Compute nodes ===== | ||
- | {{ : | + | {{ : |
* 119 standard nodes with the following components: | * 119 standard nodes with the following components: | ||
* 128 cores @ 2.45 GHz (two AMD 7763 CPUs) | * 128 cores @ 2.45 GHz (two AMD 7763 CPUs) | ||
* 512 GB memory | * 512 GB memory | ||
- | * 3.84 TB internal SSD disk space | + | * 3.5 TB internal SSD disk space |
* 24 nodes for multi-node jobs with the following components: | * 24 nodes for multi-node jobs with the following components: | ||
* 128 cores @ 2.45 GHz (two AMD 7763 CPUs) | * 128 cores @ 2.45 GHz (two AMD 7763 CPUs) | ||
* 512 GB memory | * 512 GB memory | ||
- | * 3.84 TB internal SSD disk space | + | * 3.5 TB internal SSD disk space |
* 100 Gbps Omni-Path link | * 100 Gbps Omni-Path link | ||
- | {{ : | + | {{ : |
* 4 big memory nodes with the following components: | * 4 big memory nodes with the following components: | ||
* 80 cores @ 2.3 GHz (two Intel Xeon Platinum 8380 CPUs) | * 80 cores @ 2.3 GHz (two Intel Xeon Platinum 8380 CPUs) | ||
* 4096 GB memory | * 4096 GB memory | ||
- | * 15.36 TB internal SSD disk space | + | * 14 TB internal SSD disk space |
+ | |||
+ | * 2 Interactive GPU nodes (Delivered by Fujitsu in an earlier purchase) with the following components: | ||
+ | * 24 cores @ 2.4 GHz (two Intel Xeon Gold 6240R CPUs) | ||
+ | * 768 GB memory | ||
+ | * 1 Nvidia L40s GPU accelerator card with 48GB RAM | ||
* 6 GPU nodes with the following components: | * 6 GPU nodes with the following components: | ||
- | * 64 cores @ 2.6 GHz | + | * 64 cores @ 2.6 GHz (two Intel Xeon Platinum 8358 CPUs) |
* 512 GB memory | * 512 GB memory | ||
- | * 4 Nvidia A100 GPU accelerator cards | + | * 4 Nvidia A100 GPU accelerator cards with 40 GB RAM |
- | * 12.8 TB internal SSD NVMe disk space | + | * 12 TB internal SSD NVMe disk space |
* 100 Gbps Omni-Path link | * 100 Gbps Omni-Path link | ||
+ | {{ : | ||
- | * 36 GPU nodes with the following components: | + | * 18 GPU nodes (Delivered by Fujitsu in an earlier purchase) |
- | * 6 cores @ 2.7 GHz (12 cores with hyperthreading) | + | * 18 cores @ 2.7 GHz (two Intel Xeon Gold 6150 CPUs) |
- | * 128 GB memory | + | * 768 GB memory |
- | * 1 Nvidia V100 GPU accelerator card | + | * 1 Nvidia V100 GPU accelerator card with 32 GB RAM |
- | + | * 621 GB RAM disk | |
- | {{ : | + | |
* 15 nodes with the following components: | * 15 nodes with the following components: | ||
Line 44: | Line 49: | ||
* 512 GB memory | * 512 GB memory | ||
* 16 TB internal disk space | * 16 TB internal disk space | ||
- | * Only accessible by GELIFES users, see [[habrok: | + | * Only accessible by GELIFES users, see [[habrok: |
+ | |||
+ | * 1 node with the following components: | ||
+ | * 64 cores @ 2.1 GHz (two Intel Xeon Gold 6448Y CPUs) | ||
+ | * 1 TB memory | ||
+ | * 440 GB internal disk space | ||
+ | * 4 Nvidia H100 GPU accelerator cards with 80 GB RAM | ||
+ | * Only accessible for education purposes in the scope of the [[https:// | ||
===== Network ===== | ===== Network ===== | ||
- | {{ : | + | {{ : |
- | * A 100 Gbps non-blocking Omni-Path network for 24 compute and 6 GPU nodes | + | * A 100 Gbps low-latency |
+ | * High bandwidth (100 Gigabit per second) | ||
+ | * Low latency (few microseconds delay before a client starts receiving the message) | ||
* Useful for parallel processing over multiple computers | * Useful for parallel processing over multiple computers | ||
- | * Connects the system to the bulk storage | ||
* Two 25 Gbps Ethernet networks | * Two 25 Gbps Ethernet networks | ||
* Used for accessing the storage areas and for job communication | * Used for accessing the storage areas and for job communication | ||
* Can also be useful to access remote data more quickly | * Can also be useful to access remote data more quickly | ||
- | |||
===== Storage ===== | ===== Storage ===== | ||
- | {{ : | + | {{ : |
- | + | ||
- | * The cluster has 617 TB of formatted storage available. This storage is set up using the Lustre parallel file system and split into three file systems: /home, /data and /scratch. See [[peregrine: | + | |
+ | * The cluster has 2.5 PB (2562 TB) of formatted storage available. This scratch storage is set up using the Lustre parallel file system. | ||
+ | * 50 GB of home directory storage per user | ||
- | ===== System description ===== | + | See [[habrok: |
- | The layout of a modern Xeon E5 2600v3 system is shown in the figure (copied from Intel) below:\\ | ||
- | {{..: | ||
==== Clock speed and turbo mode ==== | ==== Clock speed and turbo mode ==== | ||
- | Our standard systems have two sockets with a Xeon E5 2680v3 | + | Our standard systems have two sockets with a AMD 7763 processor. Each processor has 64 CPU cores running at 2.45GHz. When not all cores of a processor are used the cores can be run at a higher clockspeed (at most 3.5 GHz). |
==== Hyperthreading ==== | ==== Hyperthreading ==== | ||
- | In principle each CPU core of a Xeon processor can run multiple threads (programs) simultaneously. This is called hyperthreading. This feature has been disabled for the Peregrine | + | In principle each CPU core of a modern |
==== Memory access ==== | ==== Memory access ==== | ||
- | Each processor has its own memory | + | Each processor has its own memory |
- | When a processor wants to access the memory of another processor it has to use the QPI connections between the processors. This connection is much slower than the connection to the local memory. This means that it is important that processes running on one of the processors use the memory local to that processor! | + | When a processor wants to access the memory of another processor it has to use the infinity fabric |
**NOTE** | **NOTE** | ||
- | You can still request all the memory on a machine, even with a single core. | + | You can still request all the memory on a machine, even with a single core. This is extremely inefficient however as most cores are idling |
- | ==== Accelerator | + | |
- | Each processor also has its own links to a PCIe 3.0 bus which connects to the network cards and accelerator cards (Infiniband, |