Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
habrok:job_management:checking_jobs [2024/05/14 10:41] – [Using jobinfo] fokke | habrok:job_management:checking_jobs [2024/06/21 09:51] (current) – [jobinfo GPU example] admin | ||
---|---|---|---|
Line 62: | Line 62: | ||
From the moment that a job is submitted, you can request relevant information about this job using the jobinfo command. If you forgot the job ID that you want to have the information for, then you are able to request all jobs that you have submitted with '' | From the moment that a job is submitted, you can request relevant information about this job using the jobinfo command. If you forgot the job ID that you want to have the information for, then you are able to request all jobs that you have submitted with '' | ||
+ | |||
+ | The code for the jobinfo command is available at: https:// | ||
After you submitted a job, you can request the information by using the command: | After you submitted a job, you can request the information by using the command: | ||
Line 81: | Line 83: | ||
Number of Tasks : 4 | Number of Tasks : 4 | ||
State : COMPLETED | State : COMPLETED | ||
- | Submit | + | Submit |
- | Start : 2024-05-01T15:15:22 | + | Start : 2024-04-01T16:15:22 |
- | End : 2024-05-05T19:30:22 | + | End : 2024-04-05T20:30:22 |
Reserved walltime | Reserved walltime | ||
Used walltime | Used walltime | ||
Line 101: | Line 103: | ||
</ | </ | ||
+ | The jobinfo command supports the option '' | ||
===== Interpreting jobinfo output ===== | ===== Interpreting jobinfo output ===== | ||
- | This information shows that the job has run for 21 seconds, while 20 minutes | + | This information shows that the job has run for more than 4 days, while 10 days were requested. With this knowledge similar jobs can be submitted with sbatch, while requesting less time for the resources. By doing so, the SLURM scheduler might be able to schedule your job earlier than it might have for a 10 day request. |
- | The same is true for the number of requested cores (which is requested with --ntasks, --ntasks-per-node, | + | An important metric |
+ | The low efficiency results in a hint being displayed. | ||
- | Finally, we look at the amount of memory reserved. Each standard node has 128GB of memory and 24 cores, meaning that there is on average ~5GB per core available. For simple | + | Not using the resources you requested |
+ | Finally, we look at the amount of memory reserved. Each standard node has 512GB of memory and 128 cores, meaning that there is on average 4GB per core available. For simple jobs this should be more than enough. If you do request more than 4GB memory, it might be useful to look at the "Max Mem used" afterwards with jobinfo to check if you really needed the extra memory. You can then adjust the requested amount of memory for for similar future jobs. | ||
+ | In this case 8.71G is used at the maximum of this job, thus requesting 40GB is not very efficient. In this case the amount requested per core is 2.5 GB, so for this case this is not a big issue. | ||
+ | ===== jobinfo GPU example ===== | ||
+ | |||
+ | Here is the output of a job that was using a GPU: | ||
+ | < | ||
+ | Job ID : 833913 | ||
+ | Name : gpu_job | ||
+ | User : s_number | ||
+ | Partition | ||
+ | Nodes : a100gpu5 | ||
+ | Number of Nodes : 1 | ||
+ | Cores : 16 | ||
+ | Number of Tasks : 1 | ||
+ | State : COMPLETED | ||
+ | Submit | ||
+ | Start : 2024-05-11T18: | ||
+ | End : 2024-05-11T21: | ||
+ | Reserved walltime | ||
+ | Used walltime | ||
+ | Used CPU time : 23:20:49 (Efficiency: | ||
+ | % User (Computation) | ||
+ | % System (I/O) : 13.31% | ||
+ | Total memory reserved | ||
+ | Maximum memory used : 4.29G | ||
+ | Requested GPUs : a100=1 | ||
+ | Allocated GPUs : a100=1 | ||
+ | Max GPU utilization | ||
+ | Max GPU memory used : 3.76G | ||
+ | </ | ||
+ | For a GPU job information about the GPU memory usage, GPU utilization and requested GPU resources is shown. The GPU utilization is the maximum utilization that was measured over the job's lifetime. Unfortunately this number may therefore not be very relevant as their may have been long periods of much lower GPU utilization. | ||
+ | As you can see CPU memory and GPU memory are reported separately as they are different types of memory. CPU memory is connected to the CPU and GPU memory is separate memory on the GPU board. |