Python
This page describes the recommended way to use Python on the cluster. This is a long page, so please check the table of contents menu on the right to find the information most relevant for you.
The main take home message for Python is to load the Python module for the version you want to start with, possibly by loading SciPy-bundle
, which includes a.o. optimized numpy, scipy and pandas libraries, and then to create a Python virtual environment for installing any packages that are not already provided through the module system. The details are given below.
Caveat Unfortunately packages installed outside of the main Python installation, and reached through the environment variable PYTHONPATH
cannot be upgraded inside a virtual environment. This issue is often encountered with the packages from SciPy-bundle
.
In these cases it may be better to stick to only loading the Python module itself in the version you need and installing additional modules yourself in the virtual environment.
Another trick, if you do want to use a module which imports SciPy-bundle
is to unload the SciPy-bundle
afterwards and to install numpy, scipy, pandas and any other required packages in your virtual environment instead.
Python Environments
Python Virtual Environments
On Hábrók, we have several versions of Python installed (Six versions for Python 3 only!). In addition to these bare-bones Python installations, we also have optimized versions of a handful of common Python packages (scipy
, matplotlib
, etc.). However, the Python ecosystem is so large and varied, that we have no hope of installing cluster-optimized versions of even the most common Python packages.
As a regular user on Hábrók, you have the power to build you own Python Virtual Environment, where you can install any and all Python packages you need. A Python Virtual Environment simply consists of a folder saved somewhere you have access to, and which will contain your own copy of Python, as well as all the packages you install in the Virtual Environment. You can build several Virtual Environments (for example one for each project you're working on), each residing in its own folder, and not interfering with each other. To use any of them, you simply tell the system which folder to use. You can therefore easily switch between these Virtual Environments.
Below, we show a short and hopefully simple guide to setting up and using a Python Virtual Environment, using the venv
Python package.
Building the Python Virtual Environment
Before setting up a Python Virtual Environment, you need to first load a specific version of Python. In this example, we will use the latest version of Python, 3.9.6
, that is available on Hábrók, but this should work for older versions as well. If you cannot follow these instructions for a specific version of Python, please let us know, and we will add special instructions for that version.
We load the Python module:
module load Python/3.9.6-GCCcore-11.2.0
and check that we have the right version:
python3 --version
Python 3.9.6
which is what we wanted.
Now, we need to decide where to save the folder that contains the Python Virtual Environment we're going to build. There is no restriction on this, as long as you have the permissions, but we suggest saving it in your home directory, since this storage works best for directories containing many files, and each Python Virtual Environment can contain several hundred files (or more), depending on how many packages you install. Therefore, we will place all environments in $HOME/venvs
.
It is easy to build a Python Virtual Environment:
python3 -m venv $HOME/venvs/first_env
where first_env
is the name of the environment, as well as of the folder it resides in. Give it a good descriptive name, otherwise you'll be sorry when you have 10-20 different environments.
Using the Python Virtual Environment
The Python Virtual Environment is now built, but we can't use it yet, first we need to activate it. We do this with the following command:
source $HOME/venvs/first_env/bin/activate
and this will change the prompt of the command line from something like [p123456@login1 ~]$
to something like (first_env) [p123456@login1 ~]$
. This is a really useful feature, allowing you to see, at a glance, which Python Virtual Environment you are working with.
The environment we just built and activated is a pristine one, and it only contains the Python packages that were available in the Python/3.9.6-GCCcore-11.2.0
module. However, we can now populate the environment with whatever packages we want to use in this particular project, by installing them. Before installing any additional package in the Python Virtual Environment, it might be a good idea to update pip
, the Python Package Installer, and wheel which is used to install binary packages:
pip install --upgrade pip pip install --upgrade wheel
This is not strictly necessary, but it is recommended, especially for older version of Python, which also come with older versions of pip
. Having up-to-date versions makes sure pip and wheel work with the latest package formats.
We are now ready to install additional Python packages into our Python Virtual Environment. This is as simple as
pip install package_name
where package_name
is the name of the Python package you want to install. This will install the package into the Python Virtual Environment folder $HOME/venvs/first_env
, and the package will be available every time we activate this particular environment in the future.
It is considered good practice to save the names of all the packages you wish to install in a text file (usually called requirements.txt
) and use that file to install the packages all at once with the command:
pip install -r requirements.txt
A typical requirements.txt
file would look something like
- requirements.txt
keras tqdm==4.59.0
where you can also specify a particular version of a certain packages, as for tqdm
.
How do we use the Python Virtual Environment we just build in a job script? Here's an example of such a jobscript:
- jobscript.sh
#!/bin/bash #SBATCH --time=00:01:00 #SBATCH --partition=regular module purge module load Python/3.9.6-GCCcore-11.2.0 source $HOME/venvs/first_env/bin/activate python3 --version which python3 deactivate
which you can submit with
sbatch jobscript.sh
This jobscript will first purge your module environment, then load the correct version of Python (you always have to load the Python module before activating your Python Virtual Environment), and then it activates your environment. Once the environment is activated, we check the version of Python, and the location of the Python executable, which should be $HOME/venvs/first_env/bin/python3
, the location of your environment. In place of these commands which only give you some information, you can, of course, run your own Python scripts.
Deactivating the Python Virtual Environment isn't strictly necessary, since the job ends after that in any case.
Associated Python Modules
TLDR
If you need to use a specific Python library on Hábrók, don't just pip install
it, as what you will get will not be an optimized version. First, check whether the library is already available from the specific version of Python you loaded.
If it is not, check whether the library is installed on Hábrók as a module with module avail library_name
. When using multiple libraries via the module system, pay attention to the Python and toolchain versions. Only if you've not been able to find the library, should you consider installing it via pip
and a virtual environment.
The Python ecosystem is extremely varied, with a lot of libraries for all sorts of purposes, from web servers, to numerical computing, and everything in between and to the sides.
As a Python user, you would usually install these libraries with pip
, the Python Package Installer. You can still do that on Hábrók, as we have detailed above, but this is not always the best way, because pip
doesn't optimize the libraries for the particular machines they would be running on. In an HPC environment, performance is key, especially for numerical libraries.
Libraries within the Python module
The Python module itself, comes with a host of libraries already installed (optimally), so that is the first place to look for a specific library. You can do this with:
module whatis Python/3.7.4-GCCcore-8.3.0
which gives the following output:
Python/3.7.4-GCCcore-8.3.0 : Description: Python is a programming language that lets you work more quickly and integrate your systems more effectively. Python/3.7.4-GCCcore-8.3.0 : Homepage: https://python.org/ Python/3.7.4-GCCcore-8.3.0 : URL: https://python.org/ Python/3.7.4-GCCcore-8.3.0 : Extensions: alabaster-0.7.12, asn1crypto-0.24.0, atomicwrites-1.3.0, attrs-19.1.0, Babel-2.7.0, bcrypt-3.1.7, bitstring-3.1.6, blist-1.3.6, certifi-2019.9.11, cffi-1.12.3, chardet-3.0.4, Click-7.0, cryptography-2.7, Cython-0.29.13, deap-1.3.0, decorator-4.4.0, docopt-0.6.2, docutils-0.15.2, ecdsa-0.13.2, future-0.17.1, idna-2.8, imagesize-1.1.0, importlib_metadata-0.22, ipaddress-1.0.22, Jinja2-2.10.1, joblib-0.13.2, liac-arff-2.4.0, MarkupSafe-1.1.1, mock-3.0.5, more-itertools-7.2.0, netaddr-0.7.19, netifaces-0.10.9, nose-1.3.7, packaging-19.1, paramiko-2.6.0, pathlib2-2.3.4, paycheck-1.0.2, pbr-5.4.3, pip-19.2.3, pluggy-0.13.0, psutil-5.6.3, py-1.8.0, py_expression_eval-0.3.9, pyasn1-0.4.7, pycparser-2.19, pycrypto-2.6.1, Pygments-2.4.2, PyNaCl-1.3.0, pyparsing-2.4.2, pytest-5.1.2, python-dateutil-2.8.0, pytz-2019.2, requests-2.22.0, scandir-1.10.0, setuptools-41.2.0, setuptools_scm-3.3.3, six-1.12.0, snowballstemmer-1.9.1, Sphinx-2.2.0, sphinxcontrib-applehelp-1.0.1, sphinxcontrib-devhelp-1.0.1, sphinxcontrib-htmlhelp-1.0.2, sphinxcontrib-jsmath-1.0.1, sphinxcontrib-qthelp-1.0.2, sphinxcontrib-serializinghtml-1.1.3, sphinxcontrib-websupport-1.1.2, tabulate-0.8.3, ujson-1.35, urllib3-1.25.3, virtualenv-16.7.5, wcwidth-0.1.7, wheel-0.33.6, xlrd-1.2.0, zipp-0.6.0
All these libraries will be available to you when you load the Python module with module load Python/3.7.4-GCCcore-8.3.0
.
Libraries as modules
If the library you want is not listed here, it might be that we have it installed as a module on Hábrók, in an optimized version. We've done this for several common libraries, and we strongly encourage you to use these modules, rather than pip install
the libraries. Doing so can speed up your computation significantly. Below, we present a list of the Python libraries which are installed as modules on Hábrók:
- TensorFlow
- SciPy-bundle: numpy, scipy, pandas, mpi4py, mpmath
- scikit-learn, scikit-image
- matplotliob
- PyTorch
- Numba
- Tkinter (only usable with portal or X server forwarding)
- h5py
This is not an exhaustive list, please check with module avail
to see if a module you are looking for is available before installing it with pip
. This only applies to large, well known libraries, however, so don't make a pain for yourself searching every single package you intend to import.
To find out which versions of these libraries are available on Hábrók, you can use the module avail
command, e.g.
module avail TensorFlow
which will produce something like the following output:
------------------------------------------------ /software/modules/lib ------------------------------------------------ TensorFlow/1.6.0-foss-2018a-Python-3.6.4-CUDA-9.1.85 TensorFlow/1.12.0-foss-2018a-Python-2.7.14 TensorFlow/1.6.0-foss-2018a-Python-3.6.4 TensorFlow/1.12.0-foss-2018a-Python-3.6.4 TensorFlow/1.8.0-foss-2018a-Python-3.6.4 TensorFlow/1.12.0-fosscuda-2018a-Python-2.7.14 TensorFlow/1.8.0-fosscuda-2018a-Python-3.6.4 TensorFlow/1.12.0-fosscuda-2018a-Python-3.6.4 TensorFlow/1.9.0-foss-2018a-Python-3.6.4-CUDA-9.1.85 TensorFlow/1.15.2-fosscuda-2019b-Python-3.7.4 TensorFlow/1.9.0-foss-2018a-Python-3.6.4 TensorFlow/2.0.0-foss-2019a-Python-3.7.2 TensorFlow/1.10.1-foss-2018a-Python-3.6.4 TensorFlow/2.1.0-fosscuda-2019b-Python-3.7.4 TensorFlow/1.10.1-fosscuda-2018a-Python-2.7.14 TensorFlow/2.2.0-fosscuda-2019b-Python-3.7.4 TensorFlow/1.10.1-fosscuda-2018a-Python-3.6.4 TensorFlow/2.3.1-fosscuda-2019b-Python-3.7.4 (D) Where: D: Default Module Use "module spider" to find all possible modules and extensions. Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
You can then load a specific version with module load
, e.g.:
module load TensorFlow/2.3.1-fosscuda-2019b-Python-3.7.4
TensorFlow
loads a bunch of other modules on which it depends. You can check which modules are loaded with
module list
and that will give you the following list of almost 50 modules:
Currently Loaded Modules: 1) GCCcore/8.3.0 25) GMP/6.1.2-GCCcore-8.3.0 2) zlib/1.2.11-GCCcore-8.3.0 26) libffi/3.2.1-GCCcore-8.3.0 3) binutils/2.32-GCCcore-8.3.0 27) Python/3.7.4-GCCcore-8.3.0 4) GCC/8.3.0 28) SciPy-bundle/2019.10-fosscuda-2019b-Python-3.7.4 5) CUDA/10.1.243-GCC-8.3.0 29) Szip/2.1.1-GCCcore-8.3.0 6) gcccuda/2019b 30) HDF5/1.10.5-gompic-2019b 7) numactl/2.0.12-GCCcore-8.3.0 31) h5py/2.10.0-fosscuda-2019b-Python-3.7.4 8) XZ/5.2.4-GCCcore-8.3.0 32) cURL/7.66.0-GCCcore-8.3.0 9) libxml2/2.9.9-GCCcore-8.3.0 33) double-conversion/3.1.4-GCCcore-8.3.0 10) libpciaccess/0.14-GCCcore-8.3.0 34) flatbuffers/1.12.0-GCCcore-8.3.0 11) hwloc/1.11.12-GCCcore-8.3.0 35) giflib/5.2.1-GCCcore-8.3.0 12) OpenMPI/3.1.4-gcccuda-2019b 36) ICU/64.2-GCCcore-8.3.0 13) OpenBLAS/0.3.7-GCC-8.3.0 37) JsonCpp/1.9.3-GCCcore-8.3.0 14) gompic/2019b 38) NASM/2.14.02-GCCcore-8.3.0 15) FFTW/3.3.8-gompic-2019b 39) libjpeg-turbo/2.0.3-GCCcore-8.3.0 16) ScaLAPACK/2.0.2-gompic-2019b 40) LMDB/0.9.24-GCCcore-8.3.0 17) fosscuda/2019b 41) nsync/1.24.0-GCCcore-8.3.0 18) cuDNN/7.6.4.38-gcccuda-2019b 42) PCRE/8.43-GCCcore-8.3.0 19) NCCL/2.4.8-gcccuda-2019b 43) protobuf/3.10.0-GCCcore-8.3.0 20) bzip2/1.0.8-GCCcore-8.3.0 44) protobuf-python/3.10.0-fosscuda-2019b-Python-3.7.4 21) ncurses/6.1-GCCcore-8.3.0 45) libpng/1.6.37-GCCcore-8.3.0 22) libreadline/8.0-GCCcore-8.3.0 46) snappy/1.1.7-GCCcore-8.3.0 23) Tcl/8.6.9-GCCcore-8.3.0 47) SWIG/4.0.1-GCCcore-8.3.0 24) SQLite/3.29.0-GCCcore-8.3.0 48) TensorFlow/2.3.1-fosscuda-2019b-Python-3.7.4
As you can see, several of the associated Python modules that we listed above have also been loaded, e.g. SciPy-bundle
, as well as a specific version of Python itself, i.e. Python/3.7.4-GCCcore-8.3.0
.
Associated Python modules behave just like every other module on Hábrók, which means that you need to pay careful attention to toolchain versions, fosscuda/2019b
, and Python versions.
IMPORTANT
Make sure that all the Associated Python Modules you load use the same Python and toolchain versions. Using different versions of these will most likely lead to conflicts.
Submitting jobs
This section covers submitting single jobs, for multiple jobs see page on Job arrays.
Single CPU
We take the following python script example, in order to run a Python script on the Hábrók cluster:
- python_example.py
#!/bin/env python import math #In order to use the square root function, Pythons math module is imported. x = 2*3*7 print ("The answer of 2*3*7 = %d" % (x)) x = math.sqrt(1764) print ("Also the square root of 1764 = %d" % (x))
And we save the file as python_example.py
.
Next, create a new text file that reserves resources, loads the Python module and runs the Python script. In this case the text file is called python_batch.sh
.
- python_batch.sh
#!/bin/bash #SBATCH --time=00:01:00 #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --job-name=python_example #SBATCH --mem=800 module load Python/3.6.4-foss-2018a python python_example.py
Now the job is ready to be submitted to the SLURM scheduler, the following command in the terminal will do this:
sbatch python_batch.sh
An output file is created, it should contain:
The answer of 2*3*7 = 42 Also the square root of 1764 = 42
Multiple CPUs
In this example we request 10 CPUs from the Hábrók cluster to do a simple calculation on, using all the requested CPUs. In this example an array from 0 to 10 is created and each CPU does a simple computation on each value in this array.
Requesting resources in a batch script is done as follows:
#!/bin/bash #SBATCH --time=00:10:00 #SBATCH --nodes=1 #SBATCH --ntasks=10 #SBATCH --job-name=python_cpu #SBATCH --mem=8000 module load Python/3.6.4-foss-2018a python python_cpu.py
In the Python script we create a pool of 10 threads (each for one CPU):
#!/usr/bin/env python import multiprocessing import os # For reading the number of CPUs requested. import time # For clocking the calculation. def double(data): return data * 2 if __name__ == '__main__': begin = time.time() inputs = list(range(10)) # Makes an array from 0 to 10 poolSize = int(os.environ['SLURM_JOB_CPUS_PER_NODE']) # Number of CPUs requested. pool = multiprocessing.Pool(processes=poolSize,) poolResults = pool.map(double, inputs) # Do the calculation. pool.close() # Stop pool accordingly. pool.join() # Wrap up data from the workers in the pool. print ('Pool output:', poolResults) # Results. elapsedTime = time.time() - begin print ('Time elapsed for ' , poolSize, ' workers: ', elapsedTime, ' seconds')
After the execution of this job, an output file is created in which the array with the new values is printed. Also the time elapsed for this job is printed. Note that for this case it is possible to request fewer CPUs, then each will compute more than one value in this array. However, it would not make sense to request more CPUs, because there are not as many values to compute in the array (10) as there are CPUs (>10); this means some CPUs will then be left unused.
GPU
This example shows how to submit a Python GPU job to the Hábrók cluster. In this example we make use of the pycuda library, which can be installed by typing this in the terminal:
module load Python/3.10.4-GCCcore-11.3.0 module load CUDA/11.7.0 module load Boost/1.79.0-GCC-11.3.0 pip install pycuda --user
The script installs pycuda in $HOME/.local/
.
Now that pycuda is installed, a new SLURM batch script can be created:
#!/bin/bash #SBATCH --time=00:05:00 #SBATCH --partition=gpu #SBATCH --gres=gpu:1 #SBATCH --mem=8000 module load Python/3.10.4-GCCcore-11.3.0 module load CUDA/11.7.0 module load Boost/1.79.0-GCC-11.3.0 python ./python_gpu.py
And now we need a Python script that uses GPU functions:
import pycuda.gpuarray as gpuarray import pycuda.driver as cuda import pycuda.autoinit import numpy from pycuda.curandom import rand as curand a_gpu = curand((50,)) b_gpu = curand((50,)) from pycuda.elementwise import ElementwiseKernel lin_comb = ElementwiseKernel( "float a, float *x, float b, float *y, float *z", "z[i] = a*x[i] + b*y[i]", "linear_combination") c_gpu = gpuarray.empty_like(a_gpu) lin_comb(5, a_gpu, 6, b_gpu, c_gpu) import numpy.linalg as la assert la.norm((c_gpu - (5*a_gpu+6*b_gpu)).get()) < 1e-5 print (c_gpu) # This line is added to the original file to show the final output of the c_gpu array.
The Python code above is taken from the following file in the PyCuda distribution: examples/elementwise.py
When the job is completed, the output should show:
[ 9.33068848 1.10685492 8.71351433 6.2380209 7.40134811 4.05352402 2.23266721 6.43384314 7.88853645 5.24907207 8.20568562 5.35862446 4.10265684 5.24931097 7.30736542 0.65177125 2.21118498 6.48129606 5.39043808 2.93192148 3.9563725 2.91366696 8.68741035 2.19538403 7.98006058 3.73060822 6.01299191 5.21303606 2.10666442 2.17959881 4.78864717 6.74258471 6.92914629 4.06129932 3.62104774 9.37001038 3.90818572 7.15125608 9.08951855 6.56625509 3.63945365 5.43198586 8.2178421 3.70657778 0.51833171 6.62938118 2.43193173 3.03066897 2.44896507 6.26867485]
Avoiding I/O Bottlenecks
If you are using the GPU with, say, many small image files you may notice that your jobs can take a long time to complete as the images are being read to GPU sequentially. In this case you can bypass the issue by copying your data (as an archive) to the local storage on the GPU node. To do this follow the instructions on the Many File Jobs, it describes the process in more detail.
Multiple Nodes
The next example shows how a MPI job for Python is run on the Hábrók cluster. In the SLURM batch script below three nodes are requested to do an array scatter from the master node to all nodes requested. The batch file is named python_mpi_batch.sh
.
- python_mpi_batch.sh
#!/bin/bash #SBATCH --time=00:05:00 #SBATCH --nodes=2 #SBATCH --ntasks=3 #SBATCH --job-name=python_mpi #SBATCH --mem=8000 module load Python/3.6.4-foss-2016a mpirun python ./python_mpi.py
Then the Python script is named python_mpi.py
. In this script an array is created and is scattered among all nodes, then node dependent computations are done on these values and at last all these values are collected at the master node:
- python_mpi.py
from mpi4py import MPI comm = MPI.COMM_WORLD size = comm.Get_size() rank = comm.Get_rank() # Scattering part. if rank == 0: data = [(i+1)**2 for i in range(size)] else: data = None data = comm.scatter(data, root=0) assert data == (rank+1)**2 # Check if data is scattered accordingly. print ("rank ", rank, "has data: ", data) # Node dependent computations on data. for i in range(size): if rank == i: data = data * rank # Synchronization of the nodes. comm.Barrier() # Gathering part. data = comm.gather(data, root=0) if rank == 0: print (data) else: assert data is None quit()
Submit the job by giving the command:
sbatch python_mpi_batch.sh
The output of this script gives:
rank 0 has data: 1 rank 1 has data: 4 rank 2 has data: 9 [0, 4, 18]
Note that the first four lines can be in a different order. Additional job information shows that 3 CPUs are used over 2 nodes:
############################################################################### Hábrók Cluster Job 1150286 for user 'p275545' Finished at: Wed Apr 25 11:19:30 CEST 2018 Job details: ============ Name : python_mpi User : p275545 Partition : regular Nodes : pg-node[036,196] Cores : 3 State : COMPLETED Submit : 2018-04-25T11:19:23 Start : 2018-04-25T11:19:25 End : 2018-04-25T11:19:30 Reserved walltime : 00:05:00 Used walltime : 00:00:05 Used CPU time : 00:00:01 (efficiency: 7.27%) % User (Computation): 55.23% % System (I/O) : 44.77% Mem reserved : 8000M/node Max Mem used : 0.00 (pg-node036,pg-node196) Max Disk Write : 0.00 (pg-node036,pg-node196) Max Disk Read : 0.00 (pg-node036,pg-node196) Acknowledgements: ================= Please see this page if you want to acknowledge Hábrók in your publications: https://wiki.hpc.rug.nl/habrok/additional_information/scientific_output ################################################################################
Python FAQs
My program output does not appear as expected when I submit a job.
There are two possibilities here. First; your program is not reaching the line where you expect to output something, this is something you will have to solve yourself and preferably test on a local machine. Second; python has buffered your output, then an unexpected crash ate the buffer. This makes debugging certain programs on Hábrók very tough because you might have produced some output but it is not shown because of the way Python handles output to the terminal or in this case the .job
file. The easiest solution is to run python with the -u
flag, so python -u <my_script> <other_arguments>
. There are also some solutions including logging and outputting to stderr instead of stdout.