AlphaFold
AlphaFold 3
AlphaFold 3 is not available as module yet, and due to the complex installation this may still take a while.
Meanwhile, it should be possible to run AlphaFold 3 with an Apptainer container. You can either try to build your own container using the instructions at https://github.com/google-deepmind/alphafold3/blob/main/docs/installation.md (which requires you to first build it with Docker, then convert it to Singularity/Apptainer), or you can use a prebuilt container from Docker Hub, e.g. from https://hub.docker.com/r/bockpl/alphafold/tags. We will use the latter in the following examples.
Pulling in the container image and AlphaFold 3 code
cd /scratch/$USER export APPTAINER_CACHEDIR=/scratch/$USER/apptainer_cache apptainer pull docker://bockpl/alphafold:v3.0.0-22.04-1.0
This will result in a container image file named alphafold_v3.0.0-22.04-1.0.sif
. Now clone the AlphaFold repository in the same directory using:
git clone https://github.com/google-deepmind/alphafold3.git
Running the container
You should now be able to run the code from the cloned GitHub repository in the container (which provides all the dependencies) by doing something like:
apptainer exec ./alphafold_v3.0.0-22.04-1.0.sif python3 alphafold3/run_alphafold.py
When running on a GPU node, the GPU can be made available in the container by adding a –nv
flag:
apptainer exec --nv ./alphafold_v3.0.0-22.04-1.0.sif python3 alphafold3/run_alphafold.py
More examples can be found at https://github.com/google-deepmind/alphafold3/blob/main/docs/installation.md#build-the-singularity-container-from-the-docker-image, and more examples/information about Apptainer at https://wiki.hpc.rug.nl/habrok/examples/apptainer.
Data files
The genetic databases files that are required for AlphaFold 3 can be found at /scratch/public/AlphaFold/3.0
. Due to license restrictions, the model parameters are not available (yet). You can obtain these yourselves using the instructions provided at https://github.com/google-deepmind/alphafold3?tab=readme-ov-file#obtaining-model-parameters.
AlphaFold 2
GPU versions of AlphaFold are now available on Peregrine. You can find the available versions using module avail AlphaFold
, and you can load the latest version using module load AlphaFold/2.3.1-foss-2022a-CUDA-11.7.0
.
Running AlphaFold
The module provides a simple alphafold
symlink that points to the run_alphafold.py
script, which means you can simply run alphafold
with all required options (run alphafold --help
to get more information).
Note that the run_alphafold.py
was tweaked a little bit, so that it knows where to find required commands like hhblits, hhsearch, jackhmmer, kalign
. This means that you do not have to provide the paths to these executables with options like --hhblits_binary_path
.
Running on a CPU node
By default, AlphaFold will try to use a GPU, and it even fails on nodes without a GPU. In order to instruct AlphaFold to run without a GPU, add the following to your job script:
export OPENMM_RELAX=CPU
Controlling the number of CPU cores for HHblits and jackhmmer
The module allows you to control the number of cores used by the hhblits
(default: 4 cores) and jackhmmer
(default: 8 cores) tools by setting the environment variables $ALPHAFOLD_HHBLITS_N_CPU
and/or $ALPHAFOLD_JACKHMMER_N_CPU
. You can override the default number of cores using, for instance, export ALPHAFOLD_HHBLITS_N_CPU=8
. Do note that these tools seem to run slower on more than 4/8 cores, but this may depend on your workload.
Database files
The large database files for the different AlphaFold versions are available in version-specific subdirectories at /scratch/public/AlphaFold/
.
If you want to use different databases, you can override the default data directory by using export ALPHAFOLD_DATA_DIR=/path/to/data
.
Given the fact that the initialization phase of AlphaFold is very I/O intensive while the database files are being read, reading the files from the /scratch
file system directly is very time-consuming. In order to alleviate this issue the database files have been stored in a smaller Zstandard (zstd) compressed SquashFS file system image. Using this image instead of the files on /data directly is faster. These database images (which are also specific to the version of AlphaFold that you want to use) can be found at:
/scratch/public/AlphaFold/2.3.1.zstd.sqsh
The image can be mounted to a given directory using the squashfuse
tool, for which a module is loaded that should give slightly better performance:
mkdir $TMPDIR/alphafold_data squashfuse /scratch/public/AlphaFold/2.3.1.zstd.sqsh $TMPDIR/alphafold_data
Now the AlphaFold databases are accessible at $TMPDIR/alphafold_data
. The image can be unmounted using:
fusermount -u $TMPDIR/alphafold_data
Using fast local storage
The I/O performance can be increased even further by copying the squashfs image file to fast local node storage first. All nodes have at least 1 TB of fast solid state storage available.
The local disk can be reached using the environment variable $TMPDIR
within the job. And copying can be done using the command:
cp /scratch/public/AlphaFold/2.3.1.zstd.sqsh $TMPDIR
The directory will be automatically removed when the job has finished. The mount command then looks as follows:
mkdir $TMPDIR/alphafold_data squashfuse $TMPDIR/2.3.1.zstd.sqsh $TMPDIR/alphafold_data
Example of job script
The following minimal examples can be used to submit an AlphaFold job to a regular (CPU) node or a V100 GPU node.
- alphafold-cpu.sh
#!/bin/bash #SBATCH --job-name=alphafold #SBATCH --time=04:00:00 #SBATCH --partition=regular #SBATCH --nodes=1 #SBATCH --cpus-per-task=8 #SBATCH --mem=16GB # Clean the module environment and load the squashfuse and AlphaFold module module purge module load AlphaFold/2.3.1-foss-2022a-CUDA-11.7.0 # Uncomment the following line(s) if you want to use different values for the number of cores used by hhblits/jackhmmer #export ALPHAFOLD_HHBLITS_N_CPU=8 # default: 4 #export ALPHAFOLD_JACKHMMER_N_CPU=4 # default: 8 # Use the CPU instead of a GPU export OPENMM_RELAX=CPU # Copy the squashfs image to $TMPDIR cp /scratch/public/AlphaFold/2.3.1.zstd.sqsh $TMPDIR # Create a mountpoint for the AlphaFold database in squashfs format mkdir $TMPDIR/alphafold_data # Mount the AlphaFold database squashfs image squashfuse $TMPDIR/2.3.1.zstd.sqsh $TMPDIR/alphafold_data # Set the path to the AlphaFold database export ALPHAFOLD_DATA_DIR=$TMPDIR/alphafold_data # Run AlphaFold alphafold --fasta_paths=query.fasta --max_template_date=2020-05-14 --output_dir=output # Unmount the database image fusermount -u $TMPDIR/alphafold_data
- alphafold-gpu.sh
#!/bin/bash #SBATCH --job-name=alphafold #SBATCH --time=04:00:00 #SBATCH --partition=gpu #SBATCH --nodes=1 #SBATCH --cpus-per-task=12 #SBATCH --mem=120GB #SBATCH --gres=gpu:1 module purge module load AlphaFold/2.3.1-foss-2022a-CUDA-11.7.0 # Uncomment the following line(s) if you want to use different values for the number of cores used by hhblits/jackhmmer #export ALPHAFOLD_HHBLITS_N_CPU=8 # default: 4 #export ALPHAFOLD_JACKHMMER_N_CPU=4 # default: 8 # Uncomment the following line if you are not running on a GPU node #export OPENMM_RELAX=CPU # Copy the squashfs image with the AlphaFold database to fast local storage cp /scratch/public/AlphaFold/2.3.1.zstd.sqsh $TMPDIR # Create a mountpoint for the AlphaFold database in squashfs format mkdir $TMPDIR/alphafold_data # Mount the AlphaFold database squashfs image squashfuse $TMPDIR/2.3.1.zstd.sqsh $TMPDIR/alphafold_data # Set the path to the AlphaFold database export ALPHAFOLD_DATA_DIR=$TMPDIR/alphafold_data # Run AlphaFold alphafold --fasta_paths=query.fasta --max_template_date=2020-05-14 --output_dir=output # Unmount the database image fusermount -u $TMPDIR/alphafold_data