Table of Contents

Job hints

The jobinfo tool may give hints on improving the efficiency of your jobs. A further explanation of these hints is given below.

CPU efficiency

In general poor efficiency can have two causes:

  1. The program is not running efficiently because it is delayed by storage operations.
  2. The program is not using the assigned cores effectively. This can either be because the program is not running in parallel at all, or that it is not running efficiently.

When jobinfo warns about your job's CPU efficiency, three cases are distinguished. We will explain these cases in more detail.

The program efficiency is low. Check the file in- and output pattern of your application

This hint is given for a program that is only running on a single CPU core and is thus not running in parallel. When the CPU time used is much lower than the time that passed by on the wall clock, this normally means that the program is waiting on data being read from or written to the file system.

The CPU is then waiting for these operations to finish. In general the following issues can occur:

  1. Large amounts of data are involved.
  2. Data is being read from or written to random (that is non sequential) locations in files.
  3. The program is reading from or writing to many small files. This has a high overhead, because there are a lot of metadata operations.

The following tips may help in reducing the problem:

  1. It may be possible to modify the application or its settings to limit the amount of data read or written.
  2. It may be possible to use the local file system, as explained here

The program efficiency is very low. Your program does not seem to run in parallel

This hint is given if the efficiency when requesting n cores is below 100/n. This basically means that the program is running on a single core, while n cores have been requested.

This is normally caused by the program not being parallelized at all. A program can only use multiple cores and/or nodes if the program has been written in a way that supports this. Please note that normal code will not run in parallel by itself. The programmer has to tell in the program code that this is to be done, and how it is to be done. This involves making use of special tools and libraries like OpenMP or MPI.

If the documentation of the program you are using does not state that the program code has been adapted for parallel computations, you can safely assume that it will not run in parallel. So, check your program's documentation for sections where it is explained to the user how to make use of parallelism. If this is not documented, requesting multiple cores for your program does not make any sense. It will only increase your waiting times, reduce your priority faster, and keep the resources you claimed unused. The latter will also increase the waiting times for other users. So, please don't request more cores than your program can use.

The program efficiency is low. Your program is not using the assigned cores effectively

If multiple cores and/or nodes have been claimed, but the CPU efficiency is higher than in the previous scenario this lack of efficiency can have two causes:

  1. First you need to check if your program is using all the cores you requested. It can be that this has to be explicitly set in the program parameters, and that the value you set does not match the number of cores you requested.
    1. To prevent errors like this you can make use of srun for starting MPI applications. srun (and also the mpirun versions supplied by the software modules on Peregrine) do not need to know how many cores and nodes you requested. Since they interface with the scheduler this is determined automatically.
    2. For other applications you can make use of the environment variable $SLURM_JOB_CPUS_PER_NODE set by the scheduler. You can supply $SLURM_JOB_CPUS_PER_NODE to your program arguments, where it selects the number of threads. For OpenMP programs you may have to increase the number of cores by setting $OMP_NUM_THREADS to the correct value, e.g.:
      export OMP_NUM_THREADS=$SLURM_JOB_CPUS_PER_NODE

      Note that for hybrid MPI/threaded applications this will work differently, and you will have to use $SLURM_CPUS_PER_TASK, because in that case you have to differentiate between tasks and CPUs per task.

  2. You may also need to check the file in- and output pattern of your program, like described earlier.

Memory usage

You requested much more memory than your program used. Please reduce the requested amount of memory.

SLURM tries to monitor the memory usage of your application. If this memory usage is much lower than the amount you requested you will get this hint.

This means that it is wise to check before the next run if your program can run with less memory. Please take into account the following guidelines:

  1. There is at least 4GB of memory per core. If you stay below this limit the impact of using less memory than requested is low.
  2. The memory usage is obtained by polling the usage periodically. It may therefore miss some peak usage. This is especially relevant if your program was running out of memory. The memory usage reported may be low, even if you get the OUT_OF_MEMORY status. In that case you will just have to increase the requested amount of memory, or check if you can reduce the memory requirement of your program.
  3. There are things that the memory measurement misses. So if you have been having OUT_OF_MEMORY errors before, just check if the amount of memory requested is still correct. If the program won't work with less memory, the memory reporting apparently does not catch the real memory usage of your program. We have especially seen this happening with java programs.