Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
habrok:introduction:workflow [2022/12/14 13:10] – fokke | habrok:introduction:workflow [2023/03/22 13:03] (current) – fokke | ||
---|---|---|---|
Line 4: | Line 4: | ||
In this section we will describe the basic workflow for working on the cluster. This workflow consists of five steps: | In this section we will describe the basic workflow for working on the cluster. This workflow consists of five steps: | ||
- | - Copy input data to the system. | + | - Copy input data to the system |
- Prepare the job script: | - Prepare the job script: | ||
- Define requirements | - Define requirements | ||
Line 10: | Line 10: | ||
- Run the program on the input data | - Run the program on the input data | ||
- Transfer output data back to the central storage | - Transfer output data back to the central storage | ||
- | - Submit the computational task to the job scheduling system. | + | - Submit the computational task to the job scheduling system |
- Check the status and results of your calculations | - Check the status and results of your calculations | ||
- Copy results back to your local system or archival storage | - Copy results back to your local system or archival storage | ||
Line 22: | Line 22: | ||
In this section we will focus on the data storage, and the next sections will delve deeper into the other topics, including the command-line interface, which is implied in some of the steps. | In this section we will focus on the data storage, and the next sections will delve deeper into the other topics, including the command-line interface, which is implied in some of the steps. | ||
+ | |||
===== Data ===== | ===== Data ===== | ||
- | For most applications users need to work with data. Data can be parameters for a program that needs to be run, for example to set up a simulation. It can be input data that needs to be analyzed. And finally running simulations or data analysis | + | For most applications users need to work with data. Data can be parameters for a program that needs to be run, for example to set up a simulation. It can be input data that needs to be analyzed. And, finally, running simulations or data analyses |
- | Peregrine | + | Hábrók |
- | Since the storage is decoupled, data needs to be transferred to and from the system. Input data needs to be transferred to the system. Any results that need to be further analyzed or stored for a longer period of time need to be transferred from the system | + | Since the storage is decoupled, data needs to be transferred to and from the system. Input data needs to be transferred to the Hábrók processing storage area. Any results that need to be further analyzed or stored for a longer period of time need to be transferred from Hábrók |
- | ===== Peregrine | + | ===== Hábrók |
- | Peregrine | + | Hábrók |
+ | |||
+ | On this page we will give a short description, | ||
==== home ==== | ==== home ==== | ||
- | The home area is where users can store settings, programs and small data sets. This area is limited in space to 20 GB per user and a daily tape backup of the data is being made. | + | The home area is where users can store settings, programs and small data sets. |
- | + | ||
- | ==== data ==== | + | |
- | + | ||
- | For larger data sets each user has access to a space on the data file system. By default 250 GB per user is available, which can be increased to a larger amount if required. Because of the size of the data no backups are being made of this data. | + | |
==== scratch ==== | ==== scratch ==== | ||
- | + | ||
- | The last file system | + | For larger data sets each user has access to a space on the scratch |
+ | **This area is only meant for data ready for processing, or recent results from processing. It is not meant for long term storage. | ||
==== local disks ==== | ==== local disks ==== | ||
- | The Peregrine | + | The Hábrók |
+ | |||
+ | We therefore advise users to copy their input data sets from the scratch area to the local disk at the beginning of the job. The job can then run using the local disk for reading and writing data. At the end of the job the output has to be written back to the scratch area. | ||
+ | This step is especially important if your input data is read multiple times or consists of many (>1000) files. Similar guidelines are applicable to output data. | ||
+ | |||
+ | |||
+ | ==== External storage areas ==== | ||
+ | |||
+ | Besides the storage directly available on all nodes of the cluster some external storage areas can be accessed from the login nodes. These areas are described below. | ||
+ | |||
+ | ==== Project storage ==== | ||
+ | |||
+ | On the login nodes the ''/ | ||
+ | |||
+ | ==== RDMS ==== | ||
+ | |||
+ | The Research Data Management system can be used from the Hábrók nodes. You can find more information about the RDMS on its dedicated wiki pages: | ||
+ | https:// | ||
---- | ---- | ||
- | **Next section: [[peregrine: | + | **Next section: [[habrok: |