====== Loading Python ======
Python on Hábrók is not available by default, it is provided through the module system, which allows multiple versions to coexist on the cluster without interfering with each other. Before you can use Python or install any packages, you need to load the version you want to work with. The main take home message is: load a Python module for the version you need, then create a virtual environment for any packages not already available through the module system. The rest of this page explains how to do both.
===== Loading a Python module =====
On Hábrók, several versions of Python are available through the module system. To see which versions are available, run:
module avail Python
To load a specific version, for example:
module load Python/3.13.5-GCCcore-14.3.0
You can verify the version that was loaded with:
python3 --version
See more information on how to use modules on the **[[habrok:software_environment:module_system|module system]]** page.
===== Libraries within the Python module =====
**TLDR**
If you need to use a specific Python library on Hábrók, check if it already available before installing it in your user directories ''pip install''. Using the cluster-provided modules may be possible and these are compiled and optimised for the hardware so it can significantly speed up your computation. First, check whether the library is already available from the specific version of Python you loaded. If it is not, check whether the library is installed on Hábrók as a module with ''module avail library_name''. However, if the version you need is not available, which can happen with fast-moving libraries like TensorFlow, installing it yourself in a virtual environment is the right approach (see the [[habrok_internal:drafts:python:environments|Python Environments]] page).
Remember that when loading multiple library modules, you need to always make sure they use the same Python and toolchain versions to avoid conflicts!
The Python ecosystem is extremely varied, with a lot of libraries for all sorts of purposes, from web servers, to numerical computing, and everything in between and to the sides.
As a Python user, you would usually install these libraries with ''pip'', the Python Package Installer. You can still do that on Hábrók, as we have detailed in the [[habrok_internal:drafts:python:environments:pip_venv|pip + venv]] page, but this is not always the best way, because ''pip'' doesn't optimize the libraries for the particular machines they would be running on. In an HPC environment, performance is key, especially for numerical libraries.
The Python module itself comes with a host of libraries already installed (optimally), so that is the first place to look for a specific library. You can check what is included with:
module whatis Python/3.13.5-GCCcore-14.3.0
which gives the following output:
Python/3.13.5-GCCcore-14.3.0 : Description: Python is a programming language that lets you work more quickly and integrate your systems
more effectively.
Python/3.13.5-GCCcore-14.3.0 : Homepage: https://python.org/
Python/3.13.5-GCCcore-14.3.0 : URL: https://python.org/
Python/3.13.5-GCCcore-14.3.0 : Extensions: flit_core-3.12.0, packaging-25.0, pip-25.1.1, setuptools-80.9.0, setuptools_scm-8.3.1, tomli-2.2.1, typing_extensions-4.14.0, wheel-0.45.1
All these libraries listed in "Extensions" will be available to you when you load the Python module with ''module load Python/3.13.5-GCCcore-14.3.0''.
===== Libraries as modules =====
If the library you want is not listed here, it might be that we have it installed as a module on Hábrók, in an **optimized** version. We've done this for several common libraries, and we **strongly encourage** you to use these modules, rather than ''pip install'' the libraries. Doing so can speed up your computation significantly. Below, we present a list of the Python libraries which are installed as modules on Hábrók:
* TensorFlow
* SciPy-bundle: numpy, scipy, pandas, mpi4py, mpmath
* scikit-learn, scikit-image
* matplotlib
* PyTorch
* Numba
* Tkinter (only usable with portal or X server forwarding)
* h5py
**Caveat** Unfortunately packages installed outside of the main Python installation, and reached through the environment variable ''PYTHONPATH'' cannot be upgraded inside a virtual environment. This issue is often encountered with the packages from ''SciPy-bundle''.
In these cases it may be better to stick to only loading the Python module itself in the version you need and installing additional modules yourself in the virtual environment.
Another trick, if you do want to use a module which imports ''SciPy-bundle'' is to unload the ''SciPy-bundle'' afterwards and to install numpy, scipy, pandas and any other required packages in your virtual environment instead.
This is not an exhaustive list, please check with ''module avail'' to see if a module you are looking for is available before installing it with ''pip''. This only applies to large, well known libraries, however, so don't make a pain for yourself searching every single package you intend to import.
To find out which versions of these libraries are available on Hábrók, you can use the ''module avail'' command, e.g.
module avail TensorFlow
which will produce something like the following output:
------------------------------------------------ /software/modules/lib ------------------------------------------------
TensorFlow/2.7.1-foss-2021b-CUDA-11.4.1 TensorFlow/2.11.0-foss-2022a-CUDA-11.7.0 TensorFlow/2.13.0-foss-2022b (D)
Where:
D: Default Module
If the avail list is too long consider trying:
"module --default avail" or "ml -d av" to just list the default modules.
"module overview" or "ml ov" to display the number of modules for each name.
Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
You can then load a specific version with ''module load'', e.g.:
module load TensorFlow/2.11.0-foss-2022a-CUDA-11.7.0
Notice that even though we previously loaded ''Python/3.13.5-GCCcore-14.3.0'', loading TensorFlow has replaced it with ''Python/3.10.4-GCCcore-11.3.0''. This is expected behaviour since TensorFlow was built against a specific Python version and the module system automatically loads the compatible version. This is a general rule: when you load a library module, it will bring in the Python version it was built against, regardless of what you loaded before. It is therefore important to be aware of which Python version is actually active after loading all your modules, which you can always check with ''python3 --version''.
The following have been reloaded with a version change:
1) GCCcore/14.3.0 => GCCcore/11.3.0 7) binutils/2.44-GCCcore-14.3.0 => binutils/2.38-GCCcore-11.3.0
2) OpenSSL/3 => OpenSSL/1.1 8) bzip2/1.0.8-GCCcore-14.3.0 => bzip2/1.0.8-GCCcore-11.3.0
3) Python/3.13.5-GCCcore-14.3.0 => Python/3.10.4-GCCcore-11.3.0 9) libffi/3.5.1-GCCcore-14.3.0 => libffi/3.4.2-GCCcore-11.3.0
4) SQLite/3.50.1-GCCcore-14.3.0 => SQLite/3.38.3-GCCcore-11.3.0 10) libreadline/8.2-GCCcore-14.3.0 => libreadline/8.1.2-GCCcore-11.3.0
5) Tcl/9.0.1-GCCcore-14.3.0 => Tcl/8.6.12-GCCcore-11.3.0 11) ncurses/6.5-GCCcore-14.3.0 => ncurses/6.3-GCCcore-11.3.0
6) XZ/5.8.1-GCCcore-14.3.0 => XZ/5.2.5-GCCcore-11.3.0 12) zlib/1.3.1-GCCcore-14.3.0 => zlib/1.2.12-GCCcore-11.3.0
Simultaneously TensorFlow loads a bunch of other modules on which it depends. You can check which modules are loaded with
module list
and that will give you the following list of 60 modules:
Currently Loaded Modules:
1) 2023.01 (S) 21) FFTW/3.3.10-GCC-11.3.0 41) Szip/2.1.1-GCCcore-11.3.0
2) StdEnv (S) 22) gompi/2022a 42) HDF5/1.12.2-gompi-2022a
3) GCCcore/11.3.0 23) FFTW.MPI/3.3.10-gompi-2022a 43) h5py/3.7.0-foss-2022a
4) zlib/1.2.12-GCCcore-11.3.0 24) ScaLAPACK/2.2.0-gompi-2022a-fb 44) cURL/7.83.0-GCCcore-11.3.0
5) binutils/2.38-GCCcore-11.3.0 25) foss/2022a 45) dill/0.3.6-GCCcore-11.3.0
6) GCC/11.3.0 26) CUDA/11.7.0 46) double-conversion/3.2.0-GCCcore-11.3.0
7) numactl/2.0.14-GCCcore-11.3.0 27) cuDNN/8.4.1.50-CUDA-11.7.0 47) flatbuffers/2.0.7-GCCcore-11.3.0
8) XZ/5.2.5-GCCcore-11.3.0 28) GDRCopy/2.3-GCCcore-11.3.0 48) giflib/5.2.1-GCCcore-11.3.0
9) libxml2/2.9.13-GCCcore-11.3.0 29) UCX-CUDA/1.12.1-GCCcore-11.3.0 49) ICU/71.1-GCCcore-11.3.0
10) libpciaccess/0.16-GCCcore-11.3.0 30) NCCL/2.12.12-GCCcore-11.3.0 50) JsonCpp/1.9.5-GCCcore-11.3.0
11) hwloc/2.7.1-GCCcore-11.3.0 31) bzip2/1.0.8-GCCcore-11.3.0 51) NASM/2.15.05-GCCcore-11.3.0
12) OpenSSL/1.1 32) ncurses/6.3-GCCcore-11.3.0 52) libjpeg-turbo/2.1.3-GCCcore-11.3.0
13) libevent/2.1.12-GCCcore-11.3.0 33) libreadline/8.1.2-GCCcore-11.3.0 53) LMDB/0.9.29-GCCcore-11.3.0
14) UCX/1.12.1-GCCcore-11.3.0 34) Tcl/8.6.12-GCCcore-11.3.0 54) nsync/1.25.0-GCCcore-11.3.0
15) libfabric/1.15.1-GCCcore-11.3.0 35) SQLite/3.38.3-GCCcore-11.3.0 55) protobuf/3.19.4-GCCcore-11.3.0
16) PMIx/4.1.2-GCCcore-11.3.0 36) GMP/6.2.1-GCCcore-11.3.0 56) protobuf-python/3.19.4-GCCcore-11.3.0
17) UCC/1.0.0-GCCcore-11.3.0 37) libffi/3.4.2-GCCcore-11.3.0 57) libpng/1.6.37-GCCcore-11.3.0
18) OpenMPI/4.1.4-GCC-11.3.0 38) Python/3.10.4-GCCcore-11.3.0 58) snappy/1.1.9-GCCcore-11.3.0
19) OpenBLAS/0.3.20-GCC-11.3.0 39) pybind11/2.9.2-GCCcore-11.3.0 59) networkx/2.8.4-foss-2022a
20) FlexiBLAS/3.2.0-GCC-11.3.0 40) SciPy-bundle/2022.05-foss-2022a 60) TensorFlow/2.11.0-foss-2022a-CUDA-11.7.0
As you can see, several of the associated Python modules that we listed above have also been loaded, e.g. ''SciPy-bundle'', as well as a specific version of Python itself, i.e. ''Python/3.10.4-GCCcore-11.3.0''.
Associated Python modules behave just like every other module on Hábrók, which means that you need to pay careful attention to toolchain versions, ''foss/2022a'', and Python versions.
**IMPORTANT**
Make sure that all the associated Python modules you load use the same Python and toolchain versions. Using different versions of these will most likely lead to conflicts.