Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
habrok:data_management:storage_areas [2022/12/15 16:10] – fokke | habrok:data_management:storage_areas [2025/05/20 14:09] (current) – [/scratch] Add link to Many File Jobs page pedro | ||
---|---|---|---|
Line 3: | Line 3: | ||
====== Storage areas ====== | ====== Storage areas ====== | ||
- | This page describes the directories / file systems that are available to each user for storing files and data sets. To see how many space is available | + | This page describes the directories / file systems that are available to each user for storing files and data sets. To see how much space you have available |
===== Home directories ===== | ===== Home directories ===== | ||
Line 9: | Line 9: | ||
Each user of the system has its own home directory located at one of the home directory servers. This is the directory you will be in after logging in to the system. Since we do make a tape backup of this area the amount of space for each user is limited using quotas. Currently the limit is 50GB by default. | Each user of the system has its own home directory located at one of the home directory servers. This is the directory you will be in after logging in to the system. Since we do make a tape backup of this area the amount of space for each user is limited using quotas. Currently the limit is 50GB by default. | ||
- | The data on the home directories is available on all nodes in the system. The user home directories will be in ''/ | + | The data on the home directories is available on all nodes in the system. The user home directories will be in ''/ |
On the command line the shortcuts '' | On the command line the shortcuts '' | ||
Line 15: | Line 15: | ||
===== /scratch ===== | ===== /scratch ===== | ||
- | Each user of the system also has a directory in ''/ | + | Each user of the system also has a directory in ''/ |
+ | |||
+ | Also on ''/ | ||
- | Also on ''/ | ||
The data on ''/ | The data on ''/ | ||
There is also a limit on the number of files that can be stored. This to reduce the load on the file system metadata server, which keeps track of the data about files (time of access, change, size, location, etc.). Handling a huge number of files is a challenge for most shared file systems and accessing a huge amount of files will lead to performance bottlenecks. | There is also a limit on the number of files that can be stored. This to reduce the load on the file system metadata server, which keeps track of the data about files (time of access, change, size, location, etc.). Handling a huge number of files is a challenge for most shared file systems and accessing a huge amount of files will lead to performance bottlenecks. | ||
- | The best way of handling data sets with many (> 10,000) files is to not store them on /scratch as is, but as (compressed) archive files. These files can then be extracted to the fast local storage on the compute nodes at the beginning of a job. | + | The best way of handling data sets with many (> 10,000) files is to not store them on /scratch as is, but as (compressed) archive files. These files can then be extracted to the fast local storage on the compute nodes at the beginning of a job. You can find more details and examples in our dedicated [[habrok: |
When the processing is performed on the fast local storage the job performance will be much better. | When the processing is performed on the fast local storage the job performance will be much better. | ||
- | ===== /local ===== | + | ===== $TMPDIR (local disk) ===== |
- | Each node of the cluster has an amount of fast internal disk space. Most of this space is mounted under the path '' | + | Each node of the cluster has an amount of fast internal disk space. Most of this space is mounted under a job-specific path that can be reached using the '' |
- | Note that the disk space in '' | + | Note that the disk space in '' |
You can use the name of directory in your job scripts using '' | You can use the name of directory in your job scripts using '' | ||
Line 46: | Line 47: | ||
</ | </ | ||
+ | So a full fictitious jobscript could look like: | ||
+ | < | ||
+ | module purge | ||
+ | module load MYPROGRAM/ | ||
+ | cp -r mydataset/* $TMPDIR | ||
+ | cd $TMPDIR | ||
+ | myprogram -i inputdata -o outputdata | ||
+ | cp -r outputdata / | ||
+ | </ | ||
+ | |||
+ | ====== Relevant external data stores ===== | ||
+ | |||
+ | ===== /projects ===== | ||
+ | |||
+ | On the login/ | ||
+ | |||
+ | By default 250GB is allocated. This space can be increase upon request, based on a "fair use" principle. Above a certain threshold payment for the space will be required. | ||
+ | |||
+ | ===== RDMS ===== | ||
+ | |||
+ | The Research Data Management system (RDMS) is also available on the Hábrók nodes. More details about this system can be found in the dedicated wiki: https:// |