Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
habrok:examples:alphafold [2022/06/24 12:51] – external edit 127.0.0.1 | habrok:examples:alphafold [2024/11/22 14:41] (current) – admin | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== AlphaFold ====== | ====== AlphaFold ====== | ||
- | GPU versions of AlphaFold | + | ===== AlphaFold 3 ===== |
- | ===== Running AlphaFold ===== | + | AlphaFold 3 is not available as module yet, and due to the complex installation this may still take a while. |
+ | |||
+ | Meanwhile, it should be possible to run AlphaFold 3 with an Apptainer container. You can either try to build your own container using the instructions at https:// | ||
+ | |||
+ | ==== Pulling in the container image and AlphaFold 3 code ==== | ||
+ | |||
+ | < | ||
+ | cd / | ||
+ | export APPTAINER_CACHEDIR=/ | ||
+ | apptainer pull docker:// | ||
+ | </ | ||
+ | |||
+ | This will result in a container image file named '' | ||
+ | < | ||
+ | git clone https:// | ||
+ | </ | ||
+ | |||
+ | ==== Running | ||
+ | |||
+ | You should now be able to run the code from the cloned GitHub repository in the container (which provides all the dependencies) by doing something like: | ||
+ | < | ||
+ | apptainer exec ./ | ||
+ | </ | ||
+ | |||
+ | When running on a GPU node, the GPU can be made available in the container by adding a '' | ||
+ | < | ||
+ | apptainer exec --nv ./ | ||
+ | </ | ||
+ | |||
+ | More examples can be found at https:// | ||
+ | |||
+ | ==== Data files ==== | ||
+ | |||
+ | The genetic databases files that are required for AlphaFold | ||
+ | |||
+ | ===== AlphaFold 2 ===== | ||
+ | |||
+ | GPU versions of AlphaFold are now available on Peregrine. You can find the available versions using '' | ||
+ | |||
+ | ==== Running AlphaFold | ||
The module provides a simple '' | The module provides a simple '' | ||
Note that the '' | Note that the '' | ||
- | ==== Running on a CPU node ==== | + | === Running on a CPU node === |
By default, AlphaFold will try to use a GPU, and it even fails on nodes without a GPU. In order to instruct AlphaFold to run without a GPU, add the following to your job script: | By default, AlphaFold will try to use a GPU, and it even fails on nodes without a GPU. In order to instruct AlphaFold to run without a GPU, add the following to your job script: | ||
< | < | ||
Line 14: | Line 53: | ||
</ | </ | ||
- | ===== Controlling the number of CPU cores for HHblits and jackhmmer | + | ==== Controlling the number of CPU cores for HHblits and jackhmmer ==== |
The module allows you to control the number of cores used by the '' | The module allows you to control the number of cores used by the '' | ||
- | ===== Database files ===== | + | ==== Database files ==== |
- | The large database files for the different AlphaFold versions are available in version-specific subdirectories at ''/ | + | The large database files for the different AlphaFold versions are available in version-specific subdirectories at ''/ |
If you want to use different databases, you can override the default data directory by using '' | If you want to use different databases, you can override the default data directory by using '' | ||
- | Given the fact that the initialization phase of AlphaFold is very I/O intensive while the database files are being read, reading the files from the /data file system directly is very time-consuming. In order to alleviate this issue the database files have been stored in a smaller Zstandard (zstd) compressed SquashFS file system image. Using this image instead of the files on /data directly is faster. These database images (which are also specific to the version of AlphaFold that you want to use) can be found at: | + | Given the fact that the initialization phase of AlphaFold is very I/O intensive while the database files are being read, reading the files from the '' |
< | < | ||
- | /data/public/alphafold/alphafold_data-< | + | /scratch/public/AlphaFold/2.3.1.zstd.sqsh |
</ | </ | ||
The image can be mounted to a given directory using the '' | The image can be mounted to a given directory using the '' | ||
< | < | ||
mkdir $TMPDIR/ | mkdir $TMPDIR/ | ||
- | module load squashfuse/0.1.104-GCCcore-10.3.0 | + | squashfuse /scratch/public/AlphaFold/2.3.1.zstd.sqsh $TMPDIR/ |
- | squashfuse /data/public/alphafold/alphafold_data-2.2.2.zstd.squash | + | |
</ | </ | ||
Line 39: | Line 77: | ||
</ | </ | ||
- | ==== Using fast local storage | + | === Using fast local storage === |
- | The I/O performance can be increased even further by copying the squashfs image file to fast local node storage first. | + | The I/O performance can be increased even further by copying the squashfs image file to fast local node storage first. |
- | < | + | |
- | #SBATCH --constraint=nvme | + | |
- | </ | + | |
- | Note that for CPU jobs this constraint is not required, as these nodes always have local storage. But since this storage is based on spinning disk the performance will not be good enough, and copying will already take a lot of time. | + | |
The local disk can be reached using the environment variable '' | The local disk can be reached using the environment variable '' | ||
< | < | ||
- | cp /data/public/alphafold/alphafold_data-2.2.2.zstd.squash | + | cp /scratch/public/AlphaFold/2.3.1.zstd.sqsh $TMPDIR |
</ | </ | ||
The directory will be automatically removed when the job has finished. The mount command then looks as follows: | The directory will be automatically removed when the job has finished. The mount command then looks as follows: | ||
< | < | ||
mkdir $TMPDIR/ | mkdir $TMPDIR/ | ||
- | module load squashfuse/ | + | squashfuse $TMPDIR/2.3.1.zstd.sqsh $TMPDIR/ |
- | squashfuse $TMPDIR/alphafold_data-2.2.2.zstd.squash | + | |
</ | </ | ||
- | ===== Example of job script | + | ==== Example of job script ==== |
The following minimal examples can be used to submit an AlphaFold job to a regular (CPU) node or a V100 GPU node. | The following minimal examples can be used to submit an AlphaFold job to a regular (CPU) node or a V100 GPU node. | ||
Line 73: | Line 106: | ||
# Clean the module environment and load the squashfuse and AlphaFold module | # Clean the module environment and load the squashfuse and AlphaFold module | ||
module purge | module purge | ||
- | module load squashfuse/ | + | module load AlphaFold/ |
- | module load AlphaFold/ | + | |
# Uncomment the following line(s) if you want to use different values for the number of cores used by hhblits/ | # Uncomment the following line(s) if you want to use different values for the number of cores used by hhblits/ | ||
Line 82: | Line 114: | ||
# Use the CPU instead of a GPU | # Use the CPU instead of a GPU | ||
export OPENMM_RELAX=CPU | export OPENMM_RELAX=CPU | ||
+ | |||
+ | # Copy the squashfs image to $TMPDIR | ||
+ | cp / | ||
# Create a mountpoint for the AlphaFold database in squashfs format | # Create a mountpoint for the AlphaFold database in squashfs format | ||
mkdir $TMPDIR/ | mkdir $TMPDIR/ | ||
# Mount the AlphaFold database squashfs image | # Mount the AlphaFold database squashfs image | ||
- | squashfuse /data/ | + | squashfuse |
# Set the path to the AlphaFold database | # Set the path to the AlphaFold database | ||
export ALPHAFOLD_DATA_DIR=$TMPDIR/ | export ALPHAFOLD_DATA_DIR=$TMPDIR/ | ||
Line 105: | Line 140: | ||
#SBATCH --cpus-per-task=12 | #SBATCH --cpus-per-task=12 | ||
#SBATCH --mem=120GB | #SBATCH --mem=120GB | ||
- | #SBATCH --gres=gpu:v100:1 | + | #SBATCH --gres=gpu: |
- | #SBATCH --constraint=nvme | + | |
module purge | module purge | ||
- | module load squashfuse/ | + | module load AlphaFold/ |
- | module load AlphaFold/ | + | |
# Uncomment the following line(s) if you want to use different values for the number of cores used by hhblits/ | # Uncomment the following line(s) if you want to use different values for the number of cores used by hhblits/ | ||
Line 120: | Line 153: | ||
# Copy the squashfs image with the AlphaFold database to fast local storage | # Copy the squashfs image with the AlphaFold database to fast local storage | ||
- | cp /data/public/alphafold/alphafold_data-2.2.2.zstd.squash | + | cp /scratch/public/AlphaFold/2.3.1.zstd.sqsh $TMPDIR |
# Create a mountpoint for the AlphaFold database in squashfs format | # Create a mountpoint for the AlphaFold database in squashfs format | ||
mkdir $TMPDIR/ | mkdir $TMPDIR/ | ||
# Mount the AlphaFold database squashfs image | # Mount the AlphaFold database squashfs image | ||
- | squashfuse $TMPDIR/alphafold_data-2.2.2.zstd.squash | + | squashfuse $TMPDIR/2.3.1.zstd.sqsh $TMPDIR/ |
# Set the path to the AlphaFold database | # Set the path to the AlphaFold database | ||
export ALPHAFOLD_DATA_DIR=$TMPDIR/ | export ALPHAFOLD_DATA_DIR=$TMPDIR/ |