====== Python ======
This page describes the recommended way to use Python on the cluster. This is a long page, so please check the table of contents menu on the right to find the information most relevant for you.
The main take home message for Python is to load the Python module for the version you want to start with, possibly by loading ''SciPy-bundle'', which includes a.o. optimized numpy, scipy and pandas libraries, and then to create a Python virtual environment for installing any packages that are not already provided through the module system. The details are given below.
**Caveat** Unfortunately packages installed outside of the main Python installation, and reached through the environment variable ''PYTHONPATH'' cannot be upgraded inside a virtual environment. This issue is often encountered with the packages from ''SciPy-bundle''.
In these cases it may be better to stick to only loading the Python module itself in the version you need and installing additional modules yourself in the virtual environment.
Another trick, if you do want to use a module which imports ''SciPy-bundle'' is to unload the ''SciPy-bundle'' afterwards and to install numpy, scipy, pandas and any other required packages in your virtual environment instead.
===== Python Environments =====
==== Python Virtual Environments ====
On Hábrók, we have several versions of Python installed (Six versions for Python 3 only!). In addition to these bare-bones Python installations, we also have optimized versions of a handful of common Python packages (''scipy'', ''matplotlib'', etc.). However, the Python ecosystem is so large and varied, that we have no hope of installing cluster-optimized versions of even the most common Python packages.
As a regular user on Hábrók, **you** have the power to build you own //Python Virtual Environment//, where you can install any and all Python packages you need. A Python Virtual Environment simply consists of a folder saved somewhere you have access to, and which will contain your own copy of Python, as well as all the packages you install in the Virtual Environment. You can build several Virtual Environments (for example one for each project you're working on), each residing in its own folder, and not interfering with each other. To use any of them, you simply tell the system which folder to use. You can therefore easily switch between these Virtual Environments.
Below, we show a short and hopefully simple guide to setting up and using a Python Virtual Environment, using the ''%%venv%%'' Python package.
=== Building the Python Virtual Environment ===
Before setting up a Python Virtual Environment, you need to first load a specific version of Python. In this example, we will use the latest version of Python, ''3.9.6'', that is available on Hábrók, but this should work for older versions as well. If you cannot follow these instructions for a specific version of Python, please let us know, and we will add special instructions for that version.
We load the Python module:
module load Python/3.9.6-GCCcore-11.2.0
and check that we have the right version:
python3 --version
Python 3.9.6
which is what we wanted.
Now, we need to decide where to save the folder that contains the Python Virtual Environment we're going to build. There is no restriction on this, as long as you have the permissions, but we suggest saving it in your home directory, since this storage works best for directories containing many files, and each Python Virtual Environment can contain several hundred files (or more), depending on how many packages you install. Therefore, we will place all environments in ''$HOME/venvs''.
It is easy to build a Python Virtual Environment:
python3 -m venv $HOME/venvs/first_env
where ''first_env'' is the name of the environment, as well as of the folder it resides in. Give it a good descriptive name, otherwise you'll be sorry when you have 10-20 different environments.
=== Using the Python Virtual Environment ===
The Python Virtual Environment is now built, but we can't use it yet, first we need to //activate// it. We do this with the following command:
source $HOME/venvs/first_env/bin/activate
and this will change the prompt of the command line from something like ''[p123456@login1 ~]$'' to something like ''(first_env) [p123456@login1 ~]$''. This is a really useful feature, allowing you to see, at a glance, which Python Virtual Environment you are working with.
The environment we just built and activated is a pristine one, and it only contains the Python packages that were available in the ''Python/3.9.6-GCCcore-11.2.0'' module. However, we can now populate the environment with whatever packages we want to use in this particular project, by installing them. Before installing any additional package in the Python Virtual Environment, it might be a good idea to update ''pip'', the Python Package Installer, and wheel which is used to install binary packages:
pip install --upgrade pip
pip install --upgrade wheel
This is not strictly necessary, but it is recommended, especially for older version of Python, which also come with older versions of ''pip''. Having up-to-date versions makes sure pip and wheel work with the latest package formats.
We are now ready to install additional Python packages into our Python Virtual Environment. This is as simple as
pip install package_name
where ''package_name'' is the name of the Python package you want to install. This will install the package into the Python Virtual Environment folder ''$HOME/venvs/first_env'', and the package will be available every time we activate this particular environment in the future.
It is considered good practice to save the names of all the packages you wish to install in a text file (usually called ''requirements.txt'') and use that file to install the packages all at once with the command:
pip install -r requirements.txt
A typical ''requirements.txt'' file would look something like
keras
tqdm==4.59.0
where you can also specify a particular version of a certain packages, as for ''tqdm''.
How do we use the Python Virtual Environment we just build in a job script? Here's an example of such a jobscript:
#!/bin/bash
#SBATCH --time=00:01:00
#SBATCH --partition=regular
module purge
module load Python/3.9.6-GCCcore-11.2.0
source $HOME/venvs/first_env/bin/activate
python3 --version
which python3
deactivate
which you can submit with
sbatch jobscript.sh
This jobscript will first purge your module environment, then load the correct version of Python (you always have to load the Python module before activating your Python Virtual Environment), and then it activates your environment. Once the environment is activated, we check the version of Python, and the location of the Python executable, which should be ''$HOME/venvs/first_env/bin/python3'', the location of your environment. In place of these commands which only give you some information, you can, of course, run your own Python scripts.
Deactivating the Python Virtual Environment isn't strictly necessary, since the job ends after that in any case.
===== Associated Python Modules =====
** TLDR **
If you need to use a specific Python library on Hábrók, don't just ''pip install'' it, as what you will get will not be an optimized version. First, check whether the library is already available from the specific version of Python you loaded.
If it is not, check whether the library is installed on Hábrók as a module with ''module avail library_name''. When using multiple libraries via the module system, pay attention to the Python and toolchain versions. Only if you've not been able to find the library, should you consider installing it via ''pip'' and a virtual environment.
----
The Python ecosystem is extremely varied, with a lot of libraries for all sorts of purposes, from web servers, to numerical computing, and everything in between and to the sides.
As a Python user, you would usually install these libraries with ''pip'', the Python Package Installer. You can still do that on Hábrók, as we have detailed above, but this is not always the best way, because ''pip'' doesn't optimize the libraries for the particular machines they would be running on. In an HPC environment, performance is key, especially for numerical libraries.
==== Libraries within the Python module ====
The Python module itself, comes with a host of libraries already installed (optimally), so that is the first place to look for a specific library. You can do this with:
module whatis Python/3.7.4-GCCcore-8.3.0
which gives the following output:
Python/3.7.4-GCCcore-8.3.0 : Description: Python is a programming language that lets you work more quickly and integrate your systems more effectively.
Python/3.7.4-GCCcore-8.3.0 : Homepage: https://python.org/
Python/3.7.4-GCCcore-8.3.0 : URL: https://python.org/
Python/3.7.4-GCCcore-8.3.0 : Extensions: alabaster-0.7.12, asn1crypto-0.24.0, atomicwrites-1.3.0, attrs-19.1.0, Babel-2.7.0, bcrypt-3.1.7, bitstring-3.1.6, blist-1.3.6, certifi-2019.9.11, cffi-1.12.3, chardet-3.0.4, Click-7.0, cryptography-2.7, Cython-0.29.13, deap-1.3.0, decorator-4.4.0, docopt-0.6.2, docutils-0.15.2, ecdsa-0.13.2, future-0.17.1, idna-2.8, imagesize-1.1.0, importlib_metadata-0.22, ipaddress-1.0.22, Jinja2-2.10.1, joblib-0.13.2, liac-arff-2.4.0, MarkupSafe-1.1.1, mock-3.0.5, more-itertools-7.2.0, netaddr-0.7.19, netifaces-0.10.9, nose-1.3.7, packaging-19.1, paramiko-2.6.0, pathlib2-2.3.4, paycheck-1.0.2, pbr-5.4.3, pip-19.2.3, pluggy-0.13.0, psutil-5.6.3, py-1.8.0, py_expression_eval-0.3.9, pyasn1-0.4.7, pycparser-2.19, pycrypto-2.6.1, Pygments-2.4.2, PyNaCl-1.3.0, pyparsing-2.4.2, pytest-5.1.2, python-dateutil-2.8.0, pytz-2019.2, requests-2.22.0, scandir-1.10.0, setuptools-41.2.0, setuptools_scm-3.3.3, six-1.12.0, snowballstemmer-1.9.1, Sphinx-2.2.0, sphinxcontrib-applehelp-1.0.1, sphinxcontrib-devhelp-1.0.1, sphinxcontrib-htmlhelp-1.0.2, sphinxcontrib-jsmath-1.0.1, sphinxcontrib-qthelp-1.0.2, sphinxcontrib-serializinghtml-1.1.3, sphinxcontrib-websupport-1.1.2, tabulate-0.8.3, ujson-1.35, urllib3-1.25.3, virtualenv-16.7.5, wcwidth-0.1.7, wheel-0.33.6, xlrd-1.2.0, zipp-0.6.0
All these libraries will be available to you when you load the Python module with ''module load Python/3.7.4-GCCcore-8.3.0''.
==== Libraries as modules ====
If the library you want is not listed here, it might be that we have it installed as a module on Hábrók, in an **optimized** version. We've done this for several common libraries, and we **strongly encourage** you to use these modules, rather than ''pip install'' the libraries. Doing so can speed up your computation significantly. Below, we present a list of the Python libraries which are installed as modules on Hábrók:
* TensorFlow
* SciPy-bundle: numpy, scipy, pandas, mpi4py, mpmath
* scikit-learn, scikit-image
* matplotliob
* PyTorch
* Numba
* Tkinter (only usable with portal or X server forwarding)
* h5py
This is not an exhaustive list, please check with ''module avail'' to see if a module you are looking for is available before installing it with ''pip''. This only applies to large, well known libraries, however, so don't make a pain for yourself searching every single package you intend to import.
To find out which versions of these libraries are available on Hábrók, you can use the ''module avail'' command, e.g.
module avail TensorFlow
which will produce something like the following output:
------------------------------------------------ /software/modules/lib ------------------------------------------------
TensorFlow/1.6.0-foss-2018a-Python-3.6.4-CUDA-9.1.85 TensorFlow/1.12.0-foss-2018a-Python-2.7.14
TensorFlow/1.6.0-foss-2018a-Python-3.6.4 TensorFlow/1.12.0-foss-2018a-Python-3.6.4
TensorFlow/1.8.0-foss-2018a-Python-3.6.4 TensorFlow/1.12.0-fosscuda-2018a-Python-2.7.14
TensorFlow/1.8.0-fosscuda-2018a-Python-3.6.4 TensorFlow/1.12.0-fosscuda-2018a-Python-3.6.4
TensorFlow/1.9.0-foss-2018a-Python-3.6.4-CUDA-9.1.85 TensorFlow/1.15.2-fosscuda-2019b-Python-3.7.4
TensorFlow/1.9.0-foss-2018a-Python-3.6.4 TensorFlow/2.0.0-foss-2019a-Python-3.7.2
TensorFlow/1.10.1-foss-2018a-Python-3.6.4 TensorFlow/2.1.0-fosscuda-2019b-Python-3.7.4
TensorFlow/1.10.1-fosscuda-2018a-Python-2.7.14 TensorFlow/2.2.0-fosscuda-2019b-Python-3.7.4
TensorFlow/1.10.1-fosscuda-2018a-Python-3.6.4 TensorFlow/2.3.1-fosscuda-2019b-Python-3.7.4 (D)
Where:
D: Default Module
Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
You can then load a specific version with ''module load'', e.g.:
module load TensorFlow/2.3.1-fosscuda-2019b-Python-3.7.4
''TensorFlow'' loads a bunch of other modules on which it depends. You can check which modules are loaded with
module list
and that will give you the following list of almost 50 modules:
Currently Loaded Modules:
1) GCCcore/8.3.0 25) GMP/6.1.2-GCCcore-8.3.0
2) zlib/1.2.11-GCCcore-8.3.0 26) libffi/3.2.1-GCCcore-8.3.0
3) binutils/2.32-GCCcore-8.3.0 27) Python/3.7.4-GCCcore-8.3.0
4) GCC/8.3.0 28) SciPy-bundle/2019.10-fosscuda-2019b-Python-3.7.4
5) CUDA/10.1.243-GCC-8.3.0 29) Szip/2.1.1-GCCcore-8.3.0
6) gcccuda/2019b 30) HDF5/1.10.5-gompic-2019b
7) numactl/2.0.12-GCCcore-8.3.0 31) h5py/2.10.0-fosscuda-2019b-Python-3.7.4
8) XZ/5.2.4-GCCcore-8.3.0 32) cURL/7.66.0-GCCcore-8.3.0
9) libxml2/2.9.9-GCCcore-8.3.0 33) double-conversion/3.1.4-GCCcore-8.3.0
10) libpciaccess/0.14-GCCcore-8.3.0 34) flatbuffers/1.12.0-GCCcore-8.3.0
11) hwloc/1.11.12-GCCcore-8.3.0 35) giflib/5.2.1-GCCcore-8.3.0
12) OpenMPI/3.1.4-gcccuda-2019b 36) ICU/64.2-GCCcore-8.3.0
13) OpenBLAS/0.3.7-GCC-8.3.0 37) JsonCpp/1.9.3-GCCcore-8.3.0
14) gompic/2019b 38) NASM/2.14.02-GCCcore-8.3.0
15) FFTW/3.3.8-gompic-2019b 39) libjpeg-turbo/2.0.3-GCCcore-8.3.0
16) ScaLAPACK/2.0.2-gompic-2019b 40) LMDB/0.9.24-GCCcore-8.3.0
17) fosscuda/2019b 41) nsync/1.24.0-GCCcore-8.3.0
18) cuDNN/7.6.4.38-gcccuda-2019b 42) PCRE/8.43-GCCcore-8.3.0
19) NCCL/2.4.8-gcccuda-2019b 43) protobuf/3.10.0-GCCcore-8.3.0
20) bzip2/1.0.8-GCCcore-8.3.0 44) protobuf-python/3.10.0-fosscuda-2019b-Python-3.7.4
21) ncurses/6.1-GCCcore-8.3.0 45) libpng/1.6.37-GCCcore-8.3.0
22) libreadline/8.0-GCCcore-8.3.0 46) snappy/1.1.7-GCCcore-8.3.0
23) Tcl/8.6.9-GCCcore-8.3.0 47) SWIG/4.0.1-GCCcore-8.3.0
24) SQLite/3.29.0-GCCcore-8.3.0 48) TensorFlow/2.3.1-fosscuda-2019b-Python-3.7.4
As you can see, several of the associated Python modules that we listed above have also been loaded, e.g. ''SciPy-bundle'', as well as a specific version of Python itself, i.e. ''Python/3.7.4-GCCcore-8.3.0''.
Associated Python modules behave just like every other module on Hábrók, which means that you need to pay careful attention to toolchain versions, ''fosscuda/2019b'', and Python versions.
**IMPORTANT**
**Make sure that all the Associated Python Modules you load use the same Python and toolchain versions. Using different versions of these will most likely lead to conflicts.**
===== Submitting jobs =====
This section covers submitting single jobs, for multiple jobs see page on **[[..advanced_job_management:job_arrays]]**.
==== Single CPU ====
We take the following python script example, in order to run a Python script on the Hábrók cluster:
#!/bin/env python
import math #In order to use the square root function, Pythons math module is imported.
x = 2*3*7
print ("The answer of 2*3*7 = %d" % (x))
x = math.sqrt(1764)
print ("Also the square root of 1764 = %d" % (x))
And we save the file as ''%%python_example.py%%''.\\
Next, create a new text file that reserves resources, loads the Python module and runs the Python script. In this case the text file is called ''%%python_batch.sh%%''.
#!/bin/bash
#SBATCH --time=00:01:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --job-name=python_example
#SBATCH --mem=800
module load Python/3.6.4-foss-2018a
python python_example.py
Now the job is ready to be submitted to the SLURM scheduler, the following command in the terminal will do this:
sbatch python_batch.sh
An output file is created, it should contain:
The answer of 2*3*7 = 42
Also the square root of 1764 = 42
==== Multiple CPUs ====
In this example we request 10 CPUs from the Hábrók cluster to do a simple calculation on, using all the requested CPUs. In this example an array from 0 to 10 is created and each CPU does a simple computation on each value in this array.\\
Requesting resources in a batch script is done as follows:
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --nodes=1
#SBATCH --ntasks=10
#SBATCH --job-name=python_cpu
#SBATCH --mem=8000
module load Python/3.6.4-foss-2018a
python python_cpu.py
In the Python script we create a pool of 10 threads (each for one CPU):
#!/usr/bin/env python
import multiprocessing
import os # For reading the number of CPUs requested.
import time # For clocking the calculation.
def double(data):
return data * 2
if __name__ == '__main__':
begin = time.time()
inputs = list(range(10)) # Makes an array from 0 to 10
poolSize = int(os.environ['SLURM_JOB_CPUS_PER_NODE']) # Number of CPUs requested.
pool = multiprocessing.Pool(processes=poolSize,)
poolResults = pool.map(double, inputs) # Do the calculation.
pool.close() # Stop pool accordingly.
pool.join() # Wrap up data from the workers in the pool.
print ('Pool output:', poolResults) # Results.
elapsedTime = time.time() - begin
print ('Time elapsed for ' , poolSize, ' workers: ', elapsedTime, ' seconds')
After the execution of this job, an output file is created in which the array with the new values is printed. Also the time elapsed for this job is printed. Note that for this case it is possible to request fewer CPUs, then each will compute more than one value in this array. However, it would not make sense to request more CPUs, because there are not as many values to compute in the array (10) as there are CPUs (>10); this means some CPUs will then be left unused.
==== GPU ====
This example shows how to submit a Python GPU job to the Hábrók cluster. In this example we make use of the pycuda library, which can be installed by typing this in the terminal:
module load Python/3.10.4-GCCcore-11.3.0
module load CUDA/11.7.0
module load Boost/1.79.0-GCC-11.3.0
pip install pycuda --user
The script installs pycuda in ''%%$HOME/.local/%%''.\\
Now that pycuda is installed, a new SLURM batch script can be created:
#!/bin/bash
#SBATCH --time=00:05:00
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --mem=8000
module load Python/3.10.4-GCCcore-11.3.0
module load CUDA/11.7.0
module load Boost/1.79.0-GCC-11.3.0
python ./python_gpu.py
And now we need a Python script that uses GPU functions:
import pycuda.gpuarray as gpuarray
import pycuda.driver as cuda
import pycuda.autoinit
import numpy
from pycuda.curandom import rand as curand
a_gpu = curand((50,))
b_gpu = curand((50,))
from pycuda.elementwise import ElementwiseKernel
lin_comb = ElementwiseKernel(
"float a, float *x, float b, float *y, float *z",
"z[i] = a*x[i] + b*y[i]",
"linear_combination")
c_gpu = gpuarray.empty_like(a_gpu)
lin_comb(5, a_gpu, 6, b_gpu, c_gpu)
import numpy.linalg as la
assert la.norm((c_gpu - (5*a_gpu+6*b_gpu)).get()) < 1e-5
print (c_gpu) # This line is added to the original file to show the final output of the c_gpu array.
\\
The Python code above is taken from the following file in the PyCuda distribution: ''%%examples/elementwise.py%%''\\
When the job is completed, the output should show:
[ 9.33068848 1.10685492 8.71351433 6.2380209 7.40134811 4.05352402
2.23266721 6.43384314 7.88853645 5.24907207 8.20568562 5.35862446
4.10265684 5.24931097 7.30736542 0.65177125 2.21118498 6.48129606
5.39043808 2.93192148 3.9563725 2.91366696 8.68741035 2.19538403
7.98006058 3.73060822 6.01299191 5.21303606 2.10666442 2.17959881
4.78864717 6.74258471 6.92914629 4.06129932 3.62104774 9.37001038
3.90818572 7.15125608 9.08951855 6.56625509 3.63945365 5.43198586
8.2178421 3.70657778 0.51833171 6.62938118 2.43193173 3.03066897
2.44896507 6.26867485]
=== Avoiding I/O Bottlenecks ===
If you are using the GPU with, say, many small image files you may notice that your jobs can take a long time to complete as the images are being read to GPU sequentially. In this case you can bypass the issue by copying your data (as an archive) to the local storage on the GPU node. To do this follow the instructions on the [[habrok:advanced_job_management:many_file_jobs|Many File Jobs]], it describes the process in more detail.
==== Multiple Nodes ====
The next example shows how a MPI job for Python is run on the Hábrók cluster. In the SLURM batch script below three nodes are requested to do an array scatter from the master node to all nodes requested. The batch file is named ''%%python_mpi_batch.sh%%''.
#!/bin/bash
#SBATCH --time=00:05:00
#SBATCH --nodes=2
#SBATCH --ntasks=3
#SBATCH --job-name=python_mpi
#SBATCH --mem=8000
module load Python/3.6.4-foss-2016a
mpirun python ./python_mpi.py
Then the Python script is named ''%%python_mpi.py%%''. In this script an array is created and is scattered among all nodes, then node dependent computations are done on these values and at last all these values are collected at the master node:
from mpi4py import MPI
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
# Scattering part.
if rank == 0:
data = [(i+1)**2 for i in range(size)]
else:
data = None
data = comm.scatter(data, root=0)
assert data == (rank+1)**2
# Check if data is scattered accordingly.
print ("rank ", rank, "has data: ", data)
# Node dependent computations on data.
for i in range(size):
if rank == i:
data = data * rank
# Synchronization of the nodes.
comm.Barrier()
# Gathering part.
data = comm.gather(data, root=0)
if rank == 0:
print (data)
else:
assert data is None
quit()
Submit the job by giving the command:
sbatch python_mpi_batch.sh
The output of this script gives:
rank 0 has data: 1
rank 1 has data: 4
rank 2 has data: 9
[0, 4, 18]
Note that the first four lines can be in a different order. Additional job information shows that 3 CPUs are used over 2 nodes:
###############################################################################
Hábrók Cluster
Job 1150286 for user 'p275545'
Finished at: Wed Apr 25 11:19:30 CEST 2018
Job details:
============
Name : python_mpi
User : p275545
Partition : regular
Nodes : pg-node[036,196]
Cores : 3
State : COMPLETED
Submit : 2018-04-25T11:19:23
Start : 2018-04-25T11:19:25
End : 2018-04-25T11:19:30
Reserved walltime : 00:05:00
Used walltime : 00:00:05
Used CPU time : 00:00:01 (efficiency: 7.27%)
% User (Computation): 55.23%
% System (I/O) : 44.77%
Mem reserved : 8000M/node
Max Mem used : 0.00 (pg-node036,pg-node196)
Max Disk Write : 0.00 (pg-node036,pg-node196)
Max Disk Read : 0.00 (pg-node036,pg-node196)
Acknowledgements:
=================
Please see this page if you want to acknowledge Hábrók in your publications:
https://wiki.hpc.rug.nl/habrok/additional_information/scientific_output
################################################################################
===== Python FAQs =====
**My program output does not appear as expected when I submit a job.**
There are two possibilities here. First; your program is not reaching the line where you expect to output something, this is something you will have to solve yourself and preferably test on a local machine. Second; python has buffered your output, then an unexpected crash ate the buffer. This makes debugging certain programs on Hábrók very tough because you might have produced some output but it is not shown because of the way Python handles output to the terminal or in this case the ''.job'' file. The easiest solution is to run python with the ''-u'' flag, so ''python -u ''. There are also some solutions including logging and outputting to stderr instead of stdout.