AlphaFold

AlphaFold 3 is not available as module yet, and due to the complex installation this may still take a while.

Meanwhile, it should be possible to run AlphaFold 3 with an Apptainer container. You can either try to build your own container using the instructions at https://github.com/google-deepmind/alphafold3/blob/main/docs/installation.md (which requires you to first build it with Docker, then convert it to Singularity/Apptainer), or you can use a prebuilt container from Docker Hub, e.g. from https://hub.docker.com/r/bockpl/alphafold/tags. We will use the latter in the following examples.

cd /scratch/$USER
export APPTAINER_CACHEDIR=/scratch/$USER/apptainer_cache
apptainer pull docker://bockpl/alphafold:v3.0.0-22.04-1.0

This will result in a container image file named alphafold_v3.0.0-22.04-1.0.sif. Now clone the AlphaFold repository in the same directory using:

git clone https://github.com/google-deepmind/alphafold3.git

You should now be able to run the code from the cloned GitHub repository in the container (which provides all the dependencies) by doing something like:

apptainer exec ./alphafold_v3.0.0-22.04-1.0.sif python3 alphafold3/run_alphafold.py

When running on a GPU node, the GPU can be made available in the container by adding a –nv flag:

apptainer exec --nv ./alphafold_v3.0.0-22.04-1.0.sif python3 alphafold3/run_alphafold.py

More examples can be found at https://github.com/google-deepmind/alphafold3/blob/main/docs/installation.md#build-the-singularity-container-from-the-docker-image, and more examples/information about Apptainer at https://wiki.hpc.rug.nl/habrok/examples/apptainer.

The genetic databases files that are required for AlphaFold 3 can be found at /scratch/public/AlphaFold/3.0. Due to license restrictions, the model parameters are not available (yet). You can obtain these yourselves using the instructions provided at https://github.com/google-deepmind/alphafold3?tab=readme-ov-file#obtaining-model-parameters.

GPU versions of AlphaFold are now available on Peregrine. You can find the available versions using module avail AlphaFold, and you can load the latest version using module load AlphaFold/2.3.1-foss-2022a-CUDA-11.7.0.

The module provides a simple alphafold symlink that points to the run_alphafold.py script, which means you can simply run alphafold with all required options (run alphafold --help to get more information).

Note that the run_alphafold.py was tweaked a little bit, so that it knows where to find required commands like hhblits, hhsearch, jackhmmer, kalign. This means that you do not have to provide the paths to these executables with options like --hhblits_binary_path.

Running on a CPU node

By default, AlphaFold will try to use a GPU, and it even fails on nodes without a GPU. In order to instruct AlphaFold to run without a GPU, add the following to your job script:

export OPENMM_RELAX=CPU

The module allows you to control the number of cores used by the hhblits (default: 4 cores) and jackhmmer (default: 8 cores) tools by setting the environment variables $ALPHAFOLD_HHBLITS_N_CPU and/or $ALPHAFOLD_JACKHMMER_N_CPU. You can override the default number of cores using, for instance, export ALPHAFOLD_HHBLITS_N_CPU=8. Do note that these tools seem to run slower on more than 4/8 cores, but this may depend on your workload.

The large database files for the different AlphaFold versions are available in version-specific subdirectories at /scratch/public/AlphaFold/.

If you want to use different databases, you can override the default data directory by using export ALPHAFOLD_DATA_DIR=/path/to/data.

Given the fact that the initialization phase of AlphaFold is very I/O intensive while the database files are being read, reading the files from the /scratch file system directly is very time-consuming. In order to alleviate this issue the database files have been stored in a smaller Zstandard (zstd) compressed SquashFS file system image. Using this image instead of the files on /data directly is faster. These database images (which are also specific to the version of AlphaFold that you want to use) can be found at:

/scratch/public/AlphaFold/2.3.1.zstd.sqsh

The image can be mounted to a given directory using the squashfuse tool, for which a module is loaded that should give slightly better performance:

mkdir $TMPDIR/alphafold_data
squashfuse /scratch/public/AlphaFold/2.3.1.zstd.sqsh $TMPDIR/alphafold_data

Now the AlphaFold databases are accessible at $TMPDIR/alphafold_data. The image can be unmounted using:

fusermount -u $TMPDIR/alphafold_data

Using fast local storage

The I/O performance can be increased even further by copying the squashfs image file to fast local node storage first. All nodes have at least 1 TB of fast solid state storage available.

The local disk can be reached using the environment variable $TMPDIR within the job. And copying can be done using the command:

cp /scratch/public/AlphaFold/2.3.1.zstd.sqsh $TMPDIR

The directory will be automatically removed when the job has finished. The mount command then looks as follows:

mkdir $TMPDIR/alphafold_data
squashfuse $TMPDIR/2.3.1.zstd.sqsh $TMPDIR/alphafold_data

The following minimal examples can be used to submit an AlphaFold job to a regular (CPU) node or a V100 GPU node.

alphafold-cpu.sh
#!/bin/bash
#SBATCH --job-name=alphafold
#SBATCH --time=04:00:00
#SBATCH --partition=regular
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=16GB
 
# Clean the module environment and load the squashfuse and AlphaFold module
module purge
module load AlphaFold/2.3.1-foss-2022a-CUDA-11.7.0
 
# Uncomment the following line(s) if you want to use different values for the number of cores used by hhblits/jackhmmer
#export ALPHAFOLD_HHBLITS_N_CPU=8 # default: 4
#export ALPHAFOLD_JACKHMMER_N_CPU=4 # default: 8
 
# Use the CPU instead of a GPU
export OPENMM_RELAX=CPU
 
# Copy the squashfs image to $TMPDIR
cp /scratch/public/AlphaFold/2.3.1.zstd.sqsh $TMPDIR
 
# Create a mountpoint for the AlphaFold database in squashfs format
mkdir $TMPDIR/alphafold_data
# Mount the AlphaFold database squashfs image
squashfuse $TMPDIR/2.3.1.zstd.sqsh $TMPDIR/alphafold_data
# Set the path to the AlphaFold database
export ALPHAFOLD_DATA_DIR=$TMPDIR/alphafold_data
 
# Run AlphaFold
alphafold --fasta_paths=query.fasta --max_template_date=2020-05-14 --output_dir=output
 
# Unmount the database image
fusermount -u $TMPDIR/alphafold_data
alphafold-gpu.sh
#!/bin/bash
#SBATCH --job-name=alphafold
#SBATCH --time=04:00:00
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --cpus-per-task=12
#SBATCH --mem=120GB
#SBATCH --gres=gpu:1
 
module purge
module load AlphaFold/2.3.1-foss-2022a-CUDA-11.7.0
 
# Uncomment the following line(s) if you want to use different values for the number of cores used by hhblits/jackhmmer
#export ALPHAFOLD_HHBLITS_N_CPU=8 # default: 4
#export ALPHAFOLD_JACKHMMER_N_CPU=4 # default: 8
 
# Uncomment the following line if you are not running on a GPU node
#export OPENMM_RELAX=CPU
 
# Copy the squashfs image with the AlphaFold database to fast local storage
cp /scratch/public/AlphaFold/2.3.1.zstd.sqsh $TMPDIR
 
# Create a mountpoint for the AlphaFold database in squashfs format
mkdir $TMPDIR/alphafold_data
# Mount the AlphaFold database squashfs image
squashfuse $TMPDIR/2.3.1.zstd.sqsh $TMPDIR/alphafold_data
# Set the path to the AlphaFold database
export ALPHAFOLD_DATA_DIR=$TMPDIR/alphafold_data
 
# Run AlphaFold
alphafold --fasta_paths=query.fasta --max_template_date=2020-05-14 --output_dir=output
 
# Unmount the database image
fusermount -u $TMPDIR/alphafold_data