Differences

This shows you the differences between two versions of the page.

--- habrok:data_management:storage_areas [2024/05/23 07:22] – [/projects] fokke
+++ habrok:data_management:storage_areas [2025/10/24 10:32] (current) – [/scratch] pedro
@@ Line 15: / Line 15: @@
 ===== /scratch =====
-Each user of the system also has a directory in ''/scratch'', which is meant for storing large amounts of data that needs to be processed. **Please be aware that backups of this data are not made, and that /scratch is not meant for long term data storage.** This means that you have to copy important data to other storage yourself. This storage can for example be the ''/projects'' or RDMS storage system.
+Each user of the system also has a directory in ''/scratch'', which is meant for storing large amounts of data that needs to be processed. **Please be aware that backups of this data are not made, and that, because of this, /scratch is not suitable for long term data storage.** This means that you have to copy important results to other storage yourself regularly. This storage can for example be the ''/projects'' or RDMS storage system.
 Also on ''/scratch'' quotas are applied to prevent the system from running out of space. Currently the limit is 250GB by default. If this limit is too low for your research purposes, you can request us to change this limit. The limit can be increased to a "fair use" value without issues. When more space is required it is expected that ''/scratch'' is still only used as a staging area for data that will immediately be processed, and that a suitable storage system is available elsewhere for storing the full data collection. These storage systems can again be the ''/projects'' or RDMS systems described below.
@@ Line 23: / Line 23: @@
 There is also a limit on the number of files that can be stored. This to reduce the load on the file system metadata server, which keeps track of the data about files (time of access, change, size, location, etc.). Handling a huge number of files is a challenge for most shared file systems and accessing a huge amount of files will lead to performance bottlenecks.
-The best way of handling data sets with many (> 10,000) files is to not store them on /scratch as is, but as (compressed) archive files. These files can then be extracted to the fast local storage on the compute nodes at the beginning of a job.
+The best way of handling data sets with many (> 10,000) files is to not store them on ''/scratch'' as is, but as (compressed) archive files. These files can then be extracted to the fast local storage on the compute nodes at the beginning of a job. You can find more details and examples in our dedicated [[habrok:advanced_job_management:many_file_jobs|page]] on this topic.
 When the processing is performed on the fast local storage the job performance will be much better.
@@ Line 38: / Line 38: @@
 Change directory:
-<code>
+<code bash>
 cd $TMPDIR
 </code>
 Copy files to the temporary directory:
-<code>
+<code bash>
 cp mydirectory/* $TMPDIR
 </code>
 So a full fictitious jobscript could look like:
-<code>
+<code bash>
 module purge
 module load MYPROGRAM/1.0
@@ Line 67: / Line 67: @@
 ===== RDMS =====
-The Research Data Management system (RDMS) is also available on the Hábrók nodes. More details about this system can be found in the dedicated wiki: https://wiki.hpc.rug.nl/rdms/start
+The **Research Data Management System** (RDMS) is also available on the Hábrók nodes. This is a **long term data storage** service provided by the university and data can be directly transferred to and from the RDMS to Hábrók using [[rdms:access:linux:icommands|]]. More details about this system can be found in the [[rdms:start|dedicated wiki]].
+Additionally, the RDMS team provide user training sessions to help you quickly get started using the system. You can find more information about these sessions in the RDMS [[rdms:training|]] page.