Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
habrok:examples:alphafold [2023/03/22 13:03] – removed fokke | habrok:examples:alphafold [2024/11/22 14:41] (current) – admin | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== AlphaFold ====== | ||
+ | |||
+ | ===== AlphaFold 3 ===== | ||
+ | |||
+ | AlphaFold 3 is not available as module yet, and due to the complex installation this may still take a while. | ||
+ | |||
+ | Meanwhile, it should be possible to run AlphaFold 3 with an Apptainer container. You can either try to build your own container using the instructions at https:// | ||
+ | |||
+ | ==== Pulling in the container image and AlphaFold 3 code ==== | ||
+ | |||
+ | < | ||
+ | cd / | ||
+ | export APPTAINER_CACHEDIR=/ | ||
+ | apptainer pull docker:// | ||
+ | </ | ||
+ | |||
+ | This will result in a container image file named '' | ||
+ | < | ||
+ | git clone https:// | ||
+ | </ | ||
+ | |||
+ | ==== Running the container ==== | ||
+ | |||
+ | You should now be able to run the code from the cloned GitHub repository in the container (which provides all the dependencies) by doing something like: | ||
+ | < | ||
+ | apptainer exec ./ | ||
+ | </ | ||
+ | |||
+ | When running on a GPU node, the GPU can be made available in the container by adding a '' | ||
+ | < | ||
+ | apptainer exec --nv ./ | ||
+ | </ | ||
+ | |||
+ | More examples can be found at https:// | ||
+ | |||
+ | ==== Data files ==== | ||
+ | |||
+ | The genetic databases files that are required for AlphaFold 3 can be found at ''/ | ||
+ | |||
+ | ===== AlphaFold 2 ===== | ||
+ | |||
+ | GPU versions of AlphaFold are now available on Peregrine. You can find the available versions using '' | ||
+ | |||
+ | ==== Running AlphaFold ==== | ||
+ | The module provides a simple '' | ||
+ | |||
+ | Note that the '' | ||
+ | |||
+ | === Running on a CPU node === | ||
+ | By default, AlphaFold will try to use a GPU, and it even fails on nodes without a GPU. In order to instruct AlphaFold to run without a GPU, add the following to your job script: | ||
+ | < | ||
+ | export OPENMM_RELAX=CPU | ||
+ | </ | ||
+ | |||
+ | ==== Controlling the number of CPU cores for HHblits and jackhmmer ==== | ||
+ | The module allows you to control the number of cores used by the '' | ||
+ | |||
+ | ==== Database files ==== | ||
+ | |||
+ | The large database files for the different AlphaFold versions are available in version-specific subdirectories at ''/ | ||
+ | |||
+ | If you want to use different databases, you can override the default data directory by using '' | ||
+ | |||
+ | Given the fact that the initialization phase of AlphaFold is very I/O intensive while the database files are being read, reading the files from the ''/ | ||
+ | < | ||
+ | / | ||
+ | </ | ||
+ | The image can be mounted to a given directory using the '' | ||
+ | < | ||
+ | mkdir $TMPDIR/ | ||
+ | squashfuse / | ||
+ | </ | ||
+ | |||
+ | Now the AlphaFold databases are accessible at '' | ||
+ | < | ||
+ | fusermount -u $TMPDIR/ | ||
+ | </ | ||
+ | |||
+ | === Using fast local storage === | ||
+ | |||
+ | The I/O performance can be increased even further by copying the squashfs image file to fast local node storage first. All nodes have at least 1 TB of fast solid state storage available. | ||
+ | |||
+ | The local disk can be reached using the environment variable '' | ||
+ | < | ||
+ | cp / | ||
+ | </ | ||
+ | The directory will be automatically removed when the job has finished. The mount command then looks as follows: | ||
+ | < | ||
+ | mkdir $TMPDIR/ | ||
+ | squashfuse $TMPDIR/ | ||
+ | </ | ||
+ | |||
+ | ==== Example of job script ==== | ||
+ | |||
+ | The following minimal examples can be used to submit an AlphaFold job to a regular (CPU) node or a V100 GPU node. | ||
+ | |||
+ | <file bash alphafold-cpu.sh> | ||
+ | #!/bin/bash | ||
+ | #SBATCH --job-name=alphafold | ||
+ | #SBATCH --time=04: | ||
+ | #SBATCH --partition=regular | ||
+ | #SBATCH --nodes=1 | ||
+ | #SBATCH --cpus-per-task=8 | ||
+ | #SBATCH --mem=16GB | ||
+ | |||
+ | # Clean the module environment and load the squashfuse and AlphaFold module | ||
+ | module purge | ||
+ | module load AlphaFold/ | ||
+ | |||
+ | # Uncomment the following line(s) if you want to use different values for the number of cores used by hhblits/ | ||
+ | #export ALPHAFOLD_HHBLITS_N_CPU=8 # default: 4 | ||
+ | #export ALPHAFOLD_JACKHMMER_N_CPU=4 # default: 8 | ||
+ | |||
+ | # Use the CPU instead of a GPU | ||
+ | export OPENMM_RELAX=CPU | ||
+ | |||
+ | # Copy the squashfs image to $TMPDIR | ||
+ | cp / | ||
+ | |||
+ | # Create a mountpoint for the AlphaFold database in squashfs format | ||
+ | mkdir $TMPDIR/ | ||
+ | # Mount the AlphaFold database squashfs image | ||
+ | squashfuse $TMPDIR/ | ||
+ | # Set the path to the AlphaFold database | ||
+ | export ALPHAFOLD_DATA_DIR=$TMPDIR/ | ||
+ | |||
+ | # Run AlphaFold | ||
+ | alphafold --fasta_paths=query.fasta --max_template_date=2020-05-14 --output_dir=output | ||
+ | |||
+ | # Unmount the database image | ||
+ | fusermount -u $TMPDIR/ | ||
+ | </ | ||
+ | |||
+ | <file bash alphafold-gpu.sh> | ||
+ | #!/bin/bash | ||
+ | #SBATCH --job-name=alphafold | ||
+ | #SBATCH --time=04: | ||
+ | #SBATCH --partition=gpu | ||
+ | #SBATCH --nodes=1 | ||
+ | #SBATCH --cpus-per-task=12 | ||
+ | #SBATCH --mem=120GB | ||
+ | #SBATCH --gres=gpu: | ||
+ | |||
+ | module purge | ||
+ | module load AlphaFold/ | ||
+ | |||
+ | # Uncomment the following line(s) if you want to use different values for the number of cores used by hhblits/ | ||
+ | #export ALPHAFOLD_HHBLITS_N_CPU=8 # default: 4 | ||
+ | #export ALPHAFOLD_JACKHMMER_N_CPU=4 # default: 8 | ||
+ | |||
+ | # Uncomment the following line if you are not running on a GPU node | ||
+ | #export OPENMM_RELAX=CPU | ||
+ | |||
+ | # Copy the squashfs image with the AlphaFold database to fast local storage | ||
+ | cp / | ||
+ | |||
+ | # Create a mountpoint for the AlphaFold database in squashfs format | ||
+ | mkdir $TMPDIR/ | ||
+ | # Mount the AlphaFold database squashfs image | ||
+ | squashfuse $TMPDIR/ | ||
+ | # Set the path to the AlphaFold database | ||
+ | export ALPHAFOLD_DATA_DIR=$TMPDIR/ | ||
+ | |||
+ | # Run AlphaFold | ||
+ | alphafold --fasta_paths=query.fasta --max_template_date=2020-05-14 --output_dir=output | ||
+ | |||
+ | # Unmount the database image | ||
+ | fusermount -u $TMPDIR/ | ||
+ | </ | ||