What is a cluster

If you are struggling with getting your computations or data analyses done, switching to a computer cluster might help. A computer cluster basically is a collection of big computers. These computers can be used for doing calculations that exceed the capacity of your desktop or laptop computer.

It is useful in cases where:

  • you need to do many computations. If you, for example, need to run many parameters for a certain model, you could run many of these computations simultaneously on the cluster.
  • some people have very long-running calculations. If you can move these to the cluster you will be able to free up your regular computer for other work that you need to do.
  • you are struggling with a large volume of data, either in size or number of items, that you need to have analyzed. Using the power of the computer cluster may help.
  • the last thing a cluster is good at is running extensive computations. If you are able to run these calculations on multiple CPU cores or even computers, you may be able to get results you could not get on a laptop or desktop.

Like described already, a computer cluster is a collection of computers. These computers are coupled together using a fast network, allowing the computers to be used in parallel for applications that support this. Furthermore, central shared storage is attached to the system. This storage can be reached from all of the participating computers, giving all computers access to the same user data. The storage system is also large enough to hold substantially sized data sets.

The cluster has many users that need to get computations done. In order to make sure that computations run optimally the computational tasks are managed by a job scheduler. This means that you won't have direct access to the compute nodes of the cluster. You can only access a few so-called login nodes directly.

In order to start calculations on the compute nodes a job script has to be written. This script consists of a description of the requirements of your computational tasks, together with the steps that need to be performed (that is which program to start on what data). The job scheduling system makes sure that these jobs get exclusive access to the compute resources they requested. This system also makes sure that every user gets a chance of running the computations they need to perform.

Access to the Hábrók cluster is available for all university staff on request. Students can get access if required for their courses or bachelor or master research, and will have to provide details about the project, including the name of the supervisor or teacher.

In order to get access you will have to fill in the online form in the CIT self service portal Iris at: https://iris.service.rug.nl/ The form can be found under “Research and innovation support”, “Computing & facilities”, “Computing (Hábrók, Merlin)”. Or you can search for “Habrok”.

In this form you will have to give your name and university account number. Besides this we request a short description why you need access. This allows us to have some more insight into what the cluster resources are being used for.

All compute clusters are based on the Linux operating system. This means that you will have to learn how to work with this. The main challenge is that the main access method for the cluster is a command-line interface. In this interface you will type commands that the remote cluster computer will execute. Results are returned in text form. Further on there will be a short introduction on a basic set of commands that you'll need to get around.


Next section: Workflow and data storage