How can I see how my job performed?
How can I test small jobs on the cluster?
How do I install a module in Python?
How do I install a package in R?
How do I know that my job is started or finished?
How much memory do I need to request in my batch script?
I am running something on the login node and the process gets killed. Why does this happen?
Is it possible to provide parameters to the batch script?
My password is not being accepted and I get “Access denied” errors when trying to log in. What is wrong?
Why does installing a package or module give a compilation failed (“no such instruction”) error?
When does my job start?
Why does it take long for my job to start?
Why is my job in state PD (pending) with reason QOSGrpCpuLimit?

Frequently Asked Questions

How can I see how my job performed?

The command jobinfo is able to give you relevant information about the performance of your job.
For example you ran a job with the job ID 12345. Then use the following command in order to see this information:

jobinfo 12345

For more detailed information about the performance of your job, see using_jobinfo

How can I test small jobs on the cluster?

In order to test small jobs on the cluster, you can log in on the interactive node of the Hábrók cluster. By typing

 ssh username@interactive1.hb.hpc.rug.nl

you will be prompted to enter your password. From here you can execute small programs by simple executing them as you would do in a normal terminal. Since the interactive node is shared among other users, a small job should take up to ±30 minutes and should not be using all available resources (e.g. CPUs, memory).

How do I install a module in Python?

A few Python modules are already installed on the Hábrók cluster, however if you want to use a module that is not installed yet, you can install it for your own user account. The Hábrók cluster automatically will detect that you have that package installed in your personal directory (/home/username/) where your username is something like “p123456”. Installing a module is described on this page, but here is an short answer:

module load Python/3.5.1-foss-2016a
pip install --user packageName

Make sure that the P from Python is capitalized.

How do I install a package in R?

In order to install a package in R, you have to start R in the cluster. This is done by loading the module R, starting R (by typing “R” in the terminal) and install the package as you normally install a package in R. After the installation you can quit R. The Hábrók cluster now will detect that you have this package installed when it is running a job which makes use of that particular package.

How do I know that my job is started or finished?

Depending of the amount of resources that are requested for a job, it can take a while for a job to get started. Therefore, it might be useful to get a notification of when the job is started. It also is very helpful to know when a job is completed if you need the results of the job for further research. With the information of the begin and completion time of a job, a better estimation can be made for running another similar job.

How much memory do I need to request in my batch script?

Each CPU on a 128GB node has about ~5GB memory (128GB/24cores), therefore multiplying this value with the number of cores that are requested is a good way to start. If this is not enough, your job will crash with an out-of-memory error; you then have to rerun it and increase the memory request (for instance, try setting the limit twice as high). After a job has run, you can see in the summary at the bottom of the output file (or by using jobinfo <job id>) how much memory it actually used.

I am running something on the login node and the process gets killed. Why does this happen?

The login node has to be used by all users of the cluster for connecting to the cluster, copying data, submitting jobs, et cetera. Therefore, the login node has some user limits on the CPU and memory usage to prevent one user from taking all these resources. If you are running something, you may exceed this limit and your application will be killed. If this happens you could try to run this process on the interactive node, which has more resources and less stringent limits, or you can submit this task as a job.

Is it possible to provide parameters to the batch script?

Yes, this is possible. It can be done by simply providing the parameters like sbatch script.sh parameter1 parameter2. Look here for an example with Python.

My password is not being accepted and I get “Access denied” errors when trying to log in. What is wrong?

Please make sure that you use lowercase usernames. For instance, use p123456 or s123456 instead of P123456 or S123456. Also, make sure that you can still log in to other university services (e.g. your email); if those still do work, contact us at hpc@rug.nl.

Why does installing a package or module give a compilation failed (“no such instruction”) error?

When the error message contains something like:

/tmp/cceBXn0u.s: Assembler messages:
/tmp/cceBXn0u.s:1677: Error: no such instruction: `shlx %rax,%rdx,%rax'

You can try installing this package manually on the login node in order to solve this problem (e.g. for R or Python). Otherwise, you need to load the right binutils module (e.g. module load binutils/2.25) before installing this package or module.

When does my job start?

After you have submitted your job, your job is put into a queue. To check the status of your job, you can type:

squeue -u $USER

or:

jobinfo <job id>

When your requested resources are available to you, your job is started.

Some more information about your position in the queue and the estimated start time of your job can be found here.

Why does it take long for my job to start?

The Hábrók cluster uses a priority queue. When you submit a lot of jobs, your priority decreases. This way, other users also get their share of the cluster. How this prioritization is done, can be found here.

Why is my job in state PD (pending) with reason QOSGrpCpuLimit?

There are some limits that prevents the cluster from being filled with a lot of long jobs, which would cause long waiting times for all other jobs. Jobs requesting more than three days of wall clock time in the “regular” partition are considered to be long jobs. So, if your job has this state, it will have to wait for some other long job(s) to finish.

Table of Contents