Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
habrok:data_management:storage_areas [2022/12/15 16:10] fokkehabrok:data_management:storage_areas [2024/05/30 10:58] (current) – [/scratch] fokke
Line 3: Line 3:
 ====== Storage areas ====== ====== Storage areas ======
  
-This page describes the directories / file systems that are available to each user for storing files and data sets. To see how many space is available on your home directory or ''/scratch'', the ''hbquota'' tool can be used. Please refer to the [[quota]] page for more information.+This page describes the directories / file systems that are available to each user for storing files and data sets. To see how much space you have available at the different file systems, the ''hbquota'' tool can be used. Please refer to the [[quota]] page for more information.
  
 ===== Home directories ===== ===== Home directories =====
Line 9: Line 9:
 Each user of the system has its own home directory located at one of the home directory servers. This is the directory you will be in after logging in to the system. Since we do make a tape backup of this area the amount of space for each user is limited using quotas. Currently the limit is 50GB by default. Each user of the system has its own home directory located at one of the home directory servers. This is the directory you will be in after logging in to the system. Since we do make a tape backup of this area the amount of space for each user is limited using quotas. Currently the limit is 50GB by default.
  
-The data on the home directories is available on all nodes in the system. The user home directories will be in ''/home[n]'', where n is a number. The number depends on the number of storage servers deployed. Your home directory may for example be in ''/home3/username'', where you have to replace your username by your p-, s- or f-number.+The data on the home directories is available on all nodes in the system. The user home directories will be in ''/home[n]'', where n is a number. The range of numbers depends on the number of storage servers deployed. Your home directory may for example be in ''/home3/username'', where you have to replace your username by your p-, s- or f-number.
  
 On the command line the shortcuts ''$HOME'' or ''~'' can be used to reach your home directory. On the command line the shortcuts ''$HOME'' or ''~'' can be used to reach your home directory.
Line 15: Line 15:
 ===== /scratch ===== ===== /scratch =====
  
-Each user of the system also has a directory in ''/scratch'', which is meant for storing large amounts of data. Please be aware that backups of this data are not madeThis means that you have to copy important data to other storage yourself.+Each user of the system also has a directory in ''/scratch'', which is meant for storing large amounts of data that needs to be processed**Please be aware that backups of this data are not made, and that, because of this, /scratch is not suitable for long term data storage.** This means that you have to copy important results to other storage yourself regularly. This storage can for example be the ''/projects'' or RDMS storage system.  
 + 
 +Also on ''/scratch'' quotas are applied to prevent the system from running out of space. Currently the limit is 250GB by default. If this limit is too low for your research purposes, you can request us to change this limit. The limit can be increased to a "fair use" value without issues. When more space is required it is expected that ''/scratch'' is still only used as a staging area for data that will immediately be processed, and that a suitable storage system is available elsewhere for storing the full data collection. These storage systems can again be the ''/projects'' or RDMS systems described below.
  
-Also on ''/scratch'' quotas are applied to prevent the system from running out of space. Currently the limit is 250GB by default. If this limit is too low for your research purposes, you can request us to change this limit.\\ 
 The data on ''/scratch'' is available on all nodes in the system. The data on ''/scratch'' is available on all nodes in the system.
  
Line 27: Line 28:
  
  
-===== /local =====+===== $TMPDIR (local disk) =====
  
-Each node of the cluster has an amount of fast internal disk space. Most of this space is mounted under the path ''/local''. This space is only available for running jobs. For each job a temporary directory is created on this disk. This directory can be reached using the environment variable ''$TMPDIR''. To prevent people from storing data permanently these directories are removed automatically after the jobs is finished. This means that you have to copy away important data from this location at the end of your job script.+Each node of the cluster has an amount of fast internal disk space. Most of this space is mounted under a job-specific path that can be reached using the ''$TMPDIR'' environment variable. This space is only available for running jobs. For each job a temporary directory is created on this disk space. To prevent people from storing data permanently these directories are removed automatically after the jobs is finished. This means that you have to copy away important data from this location at the end of your job script.
  
-Note that the disk space in ''/local'' is not shared between the nodes. You cannot access the files on one machine from another, without copying them explicitly over the network.+Note that the disk space in ''$TMPDIR'' is not shared between the nodes. You cannot access the files on one machine from another, without copying them explicitly over the network.
  
 You can use the name of directory in your job scripts using ''$TMPDIR''. The $ sign denotes that you are referring to the environment variable ''TMPDIR''. Examples: You can use the name of directory in your job scripts using ''$TMPDIR''. The $ sign denotes that you are referring to the environment variable ''TMPDIR''. Examples:
Line 46: Line 47:
 </code> </code>
  
 +So a full fictitious jobscript could look like:
 +<code>
 +module purge
 +module load MYPROGRAM/1.0
 +cp -r mydataset/* $TMPDIR
 +cd $TMPDIR
 +myprogram -i inputdata -o outputdata
 +cp -r outputdata /scratch/$USER/mydataset/
 +</code>
 +
 +====== Relevant external data stores =====
 +
 +===== /projects =====
 +
 +On the login/interactive nodes each user will have storage space in ''/projects/$USER''. This storage area is meant for storing data for a longer period. It can only be accessed on the login and interactive nodes and cannot be reached by the compute nodes as it is not optimized for processing data.
 +
 +By default 250GB is allocated. This space can be increase upon request, based on a "fair use" principle. Above a certain threshold payment for the space will be required.
 +
 +===== RDMS =====
 +
 +The Research Data Management system (RDMS) is also available on the Hábrók nodes. More details about this system can be found in the dedicated wiki: https://wiki.hpc.rug.nl/rdms/start