This is an old revision of the document!


Nvidia RTX Pro 6000 GPU Nodes

On 2026-02-27 Hábrók was upgraded with three new GPU nodes equipped with NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs with 96GB of VRAM. Each node has 8 GPUs, two Zen 5 AMD EPYC 9575F 64-Core CPUs for a total of 128 cores and approximately 1.5TB of DDR5 memory. This makes them particularly suitable for AI workloads, for example, inference and large language models.

With this upgrade, Hábrók gains a total of 24 new GPUs each with 96GB VRAM, which is significantly more video memory than the existing A100 (40GB) and V100 (32GB) GPUs. One key limitation of these new GPUs, however, is that they don't support native FP64 (~16 decimal digits of precision). This makes the RTX PRO 6000 GPUs less suitable for traditional scientific simulations that require high numerical accuracy, as FP64 operations will fall back to software emulation, resulting in significantly reduced performance. These nodes are instead best suited for AI and machine learning workloads, which typically rely on lower precision formats that the RTX Pro 6000 handles efficiently.

To request a job on the new nodes, one must specifically include it in the scheduler requirements. This can be done by including the gres requirement, for example #SBATCH --gpus-per-node=rtx_pro_6000:1 will request one of the RTX PRO 6000 GPUs for the job. Keep in mind that the V100 and A100 GPU nodes are the default nodes in the gpu partition, that is, if you simply request a job to runs in the gpu partition without specifying the kind of GPU through –gpus-per-node=GPUTYPE:N the job will be allocated wherever there is capacity in A100 or V100 nodes only. To ensure you get RTX GPUs always explicitly specify so in your jobscript.

An example job script that requests two GPUs and 32 CPU cores could be:

job_rtx.sh
#!/bin/bash
#SBATCH --time=02:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=32
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=8GB
#SBATCH --partition=gpu
#SBATCH --gpus-per-node=rtx_pro_6000:2
 
echo "Running job..."
# Execute my code...

Because the existing Hábrók GPU enabled software stack was built for the V100 and A100 GPUs, most of the system-installed software is incompatible with the new nodes. This is because most installed versions of applications are not forwards compatible with NVIDIA GPUs using NVIDIA Compute Capability 12.0. This means you should not assume that any software you have previously used on Hábrók will work, whether that is a centrally installed module such as PyTorch/2.1.2-foss-2023a-CUDA-12.1.1 or GROMACS/2024.4-foss-2023b-CUDA-12.4.0, or a custom stack you built yourself in your user directories using tools like EasyBuild, or Python/Mamba environments. Attempting to load such software will likely result in runtime errors, unless the installed versions are recent enough.

To ensure compatibility, software should instead be freshly installed to your user space, as this guarantees that up-to-date versions built for the new hardware are used. As an example, installing PyTorch example can be done like this, using one of the interactive (GPU) nodes:

# Create directory if it doesn't exist in homedir
mkdir -p ~/venvs
# Load recent Python version
module load Python/3.13.1-GCCcore-14.2.0
# Verify we are using the right Python binary
python3 --version
# Should return Python 3.13.1
 
# Create and load virtual environment
python3 -m venv ~/venvs/rtx6000_venv
source ~/venvs/rtx6000_venv/bin/activate
 
# Upgrade pip and wheel to make sure latest packages can be installed
pip install --upgrade pip wheel
 
# Install PyTorch and torchvision
pip install torch
pip install torchvision

Then, use the newly created environment in a job using an RTX PRO 6000 GPU:

job_rtx.sh
#!/bin/bash
#SBATCH --job-name=rtx_example
#SBATCH --time=00:05:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=8GB
#SBATCH --partition=gpu
#SBATCH --gpus-per-node=rtx_pro_6000:1
#SBATCH --output=rtx_example.out
 
# Load virtual Python and venv
module load Python/3.13.1-GCCcore-14.2.0
source ~/venvs/rtx6000_venv/bin/activate
 
# Create a directory for the torchvision script
mkdir -p ~/torchvision_example
cd ~/torchvision_example
 
# Copy the script from the examples folder
cp /scratch/public/gpu_examples/main.py .
 
# Add execute permissions
chmod +x main.py
 
# Run the example
python3 main.py

If we have the jobscript job_rtx.sh in our a directory, we can submit it like any other job with sbatch job_rtx.sh.

Note that because we installed torch and torchvision through pip on an RTX node, the most recent and compatible sources and binaries were automatically downloaded for the installation. That way, we ensure that our code in main.py, that requires torch and torchvision can run without issues on the new nodes.