GPU versions of AlphaFold are now available on Peregrine. You can find the available versions using module avail AlphaFold
, and you can load the latest version using module load AlphaFold/2.3.1-foss-2022a-CUDA-11.7.0
.
The module provides a simple alphafold
symlink that points to the run_alphafold.py
script, which means you can simply run alphafold
with all required options (run alphafold --help
to get more information).
Note that the run_alphafold.py
was tweaked a little bit, so that it knows where to find required commands like hhblits, hhsearch, jackhmmer, kalign
. This means that you do not have to provide the paths to these executables with options like --hhblits_binary_path
.
By default, AlphaFold will try to use a GPU, and it even fails on nodes without a GPU. In order to instruct AlphaFold to run without a GPU, add the following to your job script:
export OPENMM_RELAX=CPU
The module allows you to control the number of cores used by the hhblits
(default: 4 cores) and jackhmmer
(default: 8 cores) tools by setting the environment variables $ALPHAFOLD_HHBLITS_N_CPU
and/or $ALPHAFOLD_JACKHMMER_N_CPU
. You can override the default number of cores using, for instance, export ALPHAFOLD_HHBLITS_N_CPU=8
. Do note that these tools seem to run slower on more than 4/8 cores, but this may depend on your workload.
The large database files for the different AlphaFold versions are available in version-specific subdirectories at /scratch/public/AlphaFold/
.
If you want to use different databases, you can override the default data directory by using export ALPHAFOLD_DATA_DIR=/path/to/data
.
Given the fact that the initialization phase of AlphaFold is very I/O intensive while the database files are being read, reading the files from the /scratch
file system directly is very time-consuming. In order to alleviate this issue the database files have been stored in a smaller Zstandard (zstd) compressed SquashFS file system image. Using this image instead of the files on /data directly is faster. These database images (which are also specific to the version of AlphaFold that you want to use) can be found at:
/scratch/public/AlphaFold/2.3.1.zstd.sqsh
The image can be mounted to a given directory using the squashfuse
tool, for which a module is loaded that should give slightly better performance:
mkdir $TMPDIR/alphafold_data squashfuse /scratch/public/AlphaFold/2.3.1.zstd.sqsh $TMPDIR/alphafold_data
Now the AlphaFold databases are accessible at $TMPDIR/alphafold_data
. The image can be unmounted using:
fusermount -u $TMPDIR/alphafold_data
The I/O performance can be increased even further by copying the squashfs image file to fast local node storage first. All nodes have at least 1 TB of fast solid state storage available.
The local disk can be reached using the environment variable $TMPDIR
within the job. And copying can be done using the command:
cp /scratch/public/AlphaFold/2.3.1.zstd.sqsh $TMPDIR
The directory will be automatically removed when the job has finished. The mount command then looks as follows:
mkdir $TMPDIR/alphafold_data squashfuse $TMPDIR/2.3.1.zstd.sqsh $TMPDIR/alphafold_data
The following minimal examples can be used to submit an AlphaFold job to a regular (CPU) node or a V100 GPU node.
#!/bin/bash #SBATCH --job-name=alphafold #SBATCH --time=04:00:00 #SBATCH --partition=regular #SBATCH --nodes=1 #SBATCH --cpus-per-task=8 #SBATCH --mem=16GB # Clean the module environment and load the squashfuse and AlphaFold module module purge module load AlphaFold/2.3.1-foss-2022a-CUDA-11.7.0 # Uncomment the following line(s) if you want to use different values for the number of cores used by hhblits/jackhmmer #export ALPHAFOLD_HHBLITS_N_CPU=8 # default: 4 #export ALPHAFOLD_JACKHMMER_N_CPU=4 # default: 8 # Use the CPU instead of a GPU export OPENMM_RELAX=CPU # Copy the squashfs image to $TMPDIR cp /scratch/public/AlphaFold/2.3.1.zstd.sqsh $TMPDIR # Create a mountpoint for the AlphaFold database in squashfs format mkdir $TMPDIR/alphafold_data # Mount the AlphaFold database squashfs image squashfuse $TMPDIR/2.3.1.zstd.sqsh $TMPDIR/alphafold_data # Set the path to the AlphaFold database export ALPHAFOLD_DATA_DIR=$TMPDIR/alphafold_data # Run AlphaFold alphafold --fasta_paths=query.fasta --max_template_date=2020-05-14 --output_dir=output # Unmount the database image fusermount -u $TMPDIR/alphafold_data
#!/bin/bash #SBATCH --job-name=alphafold #SBATCH --time=04:00:00 #SBATCH --partition=gpu #SBATCH --nodes=1 #SBATCH --cpus-per-task=12 #SBATCH --mem=120GB #SBATCH --gres=gpu:1 module purge module load AlphaFold/2.3.1-foss-2022a-CUDA-11.7.0 # Uncomment the following line(s) if you want to use different values for the number of cores used by hhblits/jackhmmer #export ALPHAFOLD_HHBLITS_N_CPU=8 # default: 4 #export ALPHAFOLD_JACKHMMER_N_CPU=4 # default: 8 # Uncomment the following line if you are not running on a GPU node #export OPENMM_RELAX=CPU # Copy the squashfs image with the AlphaFold database to fast local storage cp /scratch/public/AlphaFold/2.3.1.zstd.sqsh $TMPDIR # Create a mountpoint for the AlphaFold database in squashfs format mkdir $TMPDIR/alphafold_data # Mount the AlphaFold database squashfs image squashfuse $TMPDIR/2.3.1.zstd.sqsh $TMPDIR/alphafold_data # Set the path to the AlphaFold database export ALPHAFOLD_DATA_DIR=$TMPDIR/alphafold_data # Run AlphaFold alphafold --fasta_paths=query.fasta --max_template_date=2020-05-14 --output_dir=output # Unmount the database image fusermount -u $TMPDIR/alphafold_data