Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| habrok:examples:alphafold [2023/03/22 13:03] – removed fokke | habrok:examples:alphafold [2024/11/22 14:41] (current) – admin | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== AlphaFold ====== | ||
| + | |||
| + | ===== AlphaFold 3 ===== | ||
| + | |||
| + | AlphaFold 3 is not available as module yet, and due to the complex installation this may still take a while. | ||
| + | |||
| + | Meanwhile, it should be possible to run AlphaFold 3 with an Apptainer container. You can either try to build your own container using the instructions at https:// | ||
| + | |||
| + | ==== Pulling in the container image and AlphaFold 3 code ==== | ||
| + | |||
| + | < | ||
| + | cd / | ||
| + | export APPTAINER_CACHEDIR=/ | ||
| + | apptainer pull docker:// | ||
| + | </ | ||
| + | |||
| + | This will result in a container image file named '' | ||
| + | < | ||
| + | git clone https:// | ||
| + | </ | ||
| + | |||
| + | ==== Running the container ==== | ||
| + | |||
| + | You should now be able to run the code from the cloned GitHub repository in the container (which provides all the dependencies) by doing something like: | ||
| + | < | ||
| + | apptainer exec ./ | ||
| + | </ | ||
| + | |||
| + | When running on a GPU node, the GPU can be made available in the container by adding a '' | ||
| + | < | ||
| + | apptainer exec --nv ./ | ||
| + | </ | ||
| + | |||
| + | More examples can be found at https:// | ||
| + | |||
| + | ==== Data files ==== | ||
| + | |||
| + | The genetic databases files that are required for AlphaFold 3 can be found at ''/ | ||
| + | |||
| + | ===== AlphaFold 2 ===== | ||
| + | |||
| + | GPU versions of AlphaFold are now available on Peregrine. You can find the available versions using '' | ||
| + | |||
| + | ==== Running AlphaFold ==== | ||
| + | The module provides a simple '' | ||
| + | |||
| + | Note that the '' | ||
| + | |||
| + | === Running on a CPU node === | ||
| + | By default, AlphaFold will try to use a GPU, and it even fails on nodes without a GPU. In order to instruct AlphaFold to run without a GPU, add the following to your job script: | ||
| + | < | ||
| + | export OPENMM_RELAX=CPU | ||
| + | </ | ||
| + | |||
| + | ==== Controlling the number of CPU cores for HHblits and jackhmmer ==== | ||
| + | The module allows you to control the number of cores used by the '' | ||
| + | |||
| + | ==== Database files ==== | ||
| + | |||
| + | The large database files for the different AlphaFold versions are available in version-specific subdirectories at ''/ | ||
| + | |||
| + | If you want to use different databases, you can override the default data directory by using '' | ||
| + | |||
| + | Given the fact that the initialization phase of AlphaFold is very I/O intensive while the database files are being read, reading the files from the ''/ | ||
| + | < | ||
| + | / | ||
| + | </ | ||
| + | The image can be mounted to a given directory using the '' | ||
| + | < | ||
| + | mkdir $TMPDIR/ | ||
| + | squashfuse / | ||
| + | </ | ||
| + | |||
| + | Now the AlphaFold databases are accessible at '' | ||
| + | < | ||
| + | fusermount -u $TMPDIR/ | ||
| + | </ | ||
| + | |||
| + | === Using fast local storage === | ||
| + | |||
| + | The I/O performance can be increased even further by copying the squashfs image file to fast local node storage first. All nodes have at least 1 TB of fast solid state storage available. | ||
| + | |||
| + | The local disk can be reached using the environment variable '' | ||
| + | < | ||
| + | cp / | ||
| + | </ | ||
| + | The directory will be automatically removed when the job has finished. The mount command then looks as follows: | ||
| + | < | ||
| + | mkdir $TMPDIR/ | ||
| + | squashfuse $TMPDIR/ | ||
| + | </ | ||
| + | |||
| + | ==== Example of job script ==== | ||
| + | |||
| + | The following minimal examples can be used to submit an AlphaFold job to a regular (CPU) node or a V100 GPU node. | ||
| + | |||
| + | <file bash alphafold-cpu.sh> | ||
| + | #!/bin/bash | ||
| + | #SBATCH --job-name=alphafold | ||
| + | #SBATCH --time=04: | ||
| + | #SBATCH --partition=regular | ||
| + | #SBATCH --nodes=1 | ||
| + | #SBATCH --cpus-per-task=8 | ||
| + | #SBATCH --mem=16GB | ||
| + | |||
| + | # Clean the module environment and load the squashfuse and AlphaFold module | ||
| + | module purge | ||
| + | module load AlphaFold/ | ||
| + | |||
| + | # Uncomment the following line(s) if you want to use different values for the number of cores used by hhblits/ | ||
| + | #export ALPHAFOLD_HHBLITS_N_CPU=8 # default: 4 | ||
| + | #export ALPHAFOLD_JACKHMMER_N_CPU=4 # default: 8 | ||
| + | |||
| + | # Use the CPU instead of a GPU | ||
| + | export OPENMM_RELAX=CPU | ||
| + | |||
| + | # Copy the squashfs image to $TMPDIR | ||
| + | cp / | ||
| + | |||
| + | # Create a mountpoint for the AlphaFold database in squashfs format | ||
| + | mkdir $TMPDIR/ | ||
| + | # Mount the AlphaFold database squashfs image | ||
| + | squashfuse $TMPDIR/ | ||
| + | # Set the path to the AlphaFold database | ||
| + | export ALPHAFOLD_DATA_DIR=$TMPDIR/ | ||
| + | |||
| + | # Run AlphaFold | ||
| + | alphafold --fasta_paths=query.fasta --max_template_date=2020-05-14 --output_dir=output | ||
| + | |||
| + | # Unmount the database image | ||
| + | fusermount -u $TMPDIR/ | ||
| + | </ | ||
| + | |||
| + | <file bash alphafold-gpu.sh> | ||
| + | #!/bin/bash | ||
| + | #SBATCH --job-name=alphafold | ||
| + | #SBATCH --time=04: | ||
| + | #SBATCH --partition=gpu | ||
| + | #SBATCH --nodes=1 | ||
| + | #SBATCH --cpus-per-task=12 | ||
| + | #SBATCH --mem=120GB | ||
| + | #SBATCH --gres=gpu: | ||
| + | |||
| + | module purge | ||
| + | module load AlphaFold/ | ||
| + | |||
| + | # Uncomment the following line(s) if you want to use different values for the number of cores used by hhblits/ | ||
| + | #export ALPHAFOLD_HHBLITS_N_CPU=8 # default: 4 | ||
| + | #export ALPHAFOLD_JACKHMMER_N_CPU=4 # default: 8 | ||
| + | |||
| + | # Uncomment the following line if you are not running on a GPU node | ||
| + | #export OPENMM_RELAX=CPU | ||
| + | |||
| + | # Copy the squashfs image with the AlphaFold database to fast local storage | ||
| + | cp / | ||
| + | |||
| + | # Create a mountpoint for the AlphaFold database in squashfs format | ||
| + | mkdir $TMPDIR/ | ||
| + | # Mount the AlphaFold database squashfs image | ||
| + | squashfuse $TMPDIR/ | ||
| + | # Set the path to the AlphaFold database | ||
| + | export ALPHAFOLD_DATA_DIR=$TMPDIR/ | ||
| + | |||
| + | # Run AlphaFold | ||
| + | alphafold --fasta_paths=query.fasta --max_template_date=2020-05-14 --output_dir=output | ||
| + | |||
| + | # Unmount the database image | ||
| + | fusermount -u $TMPDIR/ | ||
| + | </ | ||