{{indexmenu_n>9}}
====== Best Practices ======
This section presents a selection of best practices for using the RDMS. Adhering to these best practices will ensure the optimal user experience with the RDMS. 

The section will be gradually updated with new usage examples and tips.

If you believe that important information should be added to this section, please contact [[rdms-support@rug.nl|RDMS support]] with your request!

===== Naming Folders/Files =====

For the optimal usage of the RDMS, it is highly recommended to follow these best practices for naming your files/folders:

  * Do not use special characters (e.g. ''$%^*&#=!'') in your file/folder names.
  * Do not use periods (''.'') in your file/folder names. 
  * Do not use quote symbols (''%%'%%'' or ''"'') in your file/folder names. 
  * Prefer usage of underscore (''_'') or hyphen (''-'')  instead of white spaces in file/folder names. 

Example of a folder structure with correct naming:
<code>
$ itree project_name
project_name
  analytical_data
    machine_01
      20231223_analysis.ext
      20240111_analysis.ext
      20240325_analysis.ext
    machine_02
      20230222_analysis.ext
      20230710_analysis.ext
      20240109_analysis.ext
    machine_03
      20231020_analysis.ext
      20231120_analysis.ext
      20231212_analysis.ext
  manuscripts
    publication_v01.odt
</code>

Example of a folder structure with incorrect naming:
<code>
$ itree "Project with XXX and YYY"
Project with XXX and YYY
  analytical_data
    analytical devices @ building 1
      experiment 100% scan rate.ext
      experiment 74% scan rate.ext
      experiment 80% scan rate.ext
    analytical devices @ building 2
      Experiment 01 by user.name@rug.nl.ext
      Experiment 02 by user.name@rug.nl.ext
      Experiment 03 by user.name@rug.nl.ext
    analytical devices @ building 3
      $1-100%.ext
      Versuch #1.ext
      Versuch_Öldiffusion_erste_Möglichkeit.ext
  manuscripts
    publication.final version.odt
</code>

===== Transferring Large Data Sets =====
For the transfer of very big data sets, especially those containing files in the realm of several 100GBs and more, we recommend using the [[.:access:linux:icommands|iCommands]] CLI tool. We specifically recommend to use the ''iput'' (upload) and ''iget'' (download) commands with the following parameters:

<code>
# Upload: Single Big file
$ iput -T --lfrestart /path/to/lfRestartFile --retries 3 /path/to/big/local/file /rug/home/destination_collection/

# Upload: Big folder
$ iput -r -T -X /path/to/restartfile --lfrestart /path/to/lfRestartFile --retries 3 /path/to/big/local/folder /rug/home/destination_collection/

# Download: Single big file
$ iget -T --lfrestart /path/to/lfRestartFile --retries 3 /path/to/big/local/file /rug/home/destination_collection/

# Download: Big folder
$ iget -r -T -X /path/to/restartfile --lfrestart /path/to/lfRestartFile --retries 3 /path/to/big/local/folder /rug/home/destination_collection/
</code>

The additional parameters used for the `iput` command have the following function:
  * ''-T'': Renew the socket connection after 10 minutes. This can be useful for big data transfers to prevent events like the firewall canceling the connection.
  * ''-X /path/to/restartfile'': When this parameter is used, the command writes restart information to the specified restart file. This file contains information on how many of the files were already uploaded and what the last uploaded file was. It is especially useful for transferring folders with multiple files. 
  * ''%%--lfrestart%% /path/to/lfRestartFile'': When this parameter is used, the command writes different restart information for large files to ''/path/to/lfRestartFile''. If the transfer fails, this allows you to continue the transfer from the point where it failed for the large file.  
  * ''%%--retries%% <int>'': This function can be used in combination with the restart files. It specifies the number of automated retries of the transfer.

**Note:** For the transfer of a large amount of data, especially single big files, we recommend **not to use** the ''-K'' flag. The flag leads to calculating, storing, and comparing the checksums of the file(s) during transfer. This can sometimes take a very long time and also result in timeouts. We recommend that you instead either wait for the automated checksum calculation to finish or force checksum calculation after the transfer by using the ''ichksum'' command. 

===== Bundling of Data Sets =====

To improve the performance of the RDMS, it is recommended to store data sets in a structured format like ''*.tar'', ''*.tar.gz'', ''*.zip'', or similar (see below for more info about data compression) instead of individual files/folders. This significantly improves transfer rates as the system engages in multi-threaded transfers after reaching a minimal file size threshold (32 MB). Transferring multiple smaller files furthermore results in a big overhead, diminishing performance. 

Best practices to handle such cases are:

  - First, collect all data locally.
  - Before archiving in the RDMS, bundle the data set or its subsets into a structured data format (as mentioned above).
  - Upload the bundled format to the RDMS.
  - (Optional) Add metadata if desired.
  - (Optional) Unbundle the data on the RDMS.

For extraction on the RDMS, CLI users can use the ''ibun -x'' command as also described in the [[https://wiki.hpc.rug.nl/rdms/access/linux/createprofile#icommands_for_metda_data_management|iCommands for (Meta)data Management]] section of this wiki.

For RDMS web interface users, the "Uncompress tar" function, accessible via right-click on a ''*.tar'' file, enables extraction. Currently, this function supports only ''*.tar'' formats. 

**Note:** The ''ibun'' command does not support symlinks. It is therefore recommended to dereference symlinks upon local creation of the archives. For the ''tar'' command, this can be achieved via the additional ''-h'' flag. 

==== Choosing a Data Compression Format ====

While the bundling of data without extra compression (''*.tar'') is already very helpful to increase the performance of data transfers, additional compression is often useful, as this can reduce the data size tremendously. There are different possibilities of compression, for example:
  * ''*.tar.gz''
  * ''*.tar.bz2''
  * ''*.tar.xz''
  * ''*.tar.zst''
  * ''*.zip''
  * ''.7z''

From our experiences, ''*.tar.zst'' which uses [[https://en.wikipedia.org/wiki/Zstd|Zstandard compression]] delivers a very good compromise between achieved compression and compression time.

**Notes:** 
  * Not all compression types can be extracted via ''ibun'' on the RDMS side if needed. From the above-listed formats, ''*.7z'' does not work. In these cases, the file needs to be downloaded first before being able to be extracted.
  * In general, for archived data sets, it is also recommended not to extract them on the RDMS, but rather keep them in their bundled (and compressed) format for long-term storage.
  * In certain cases, it makes sense to not bundle the whole data set into one package, but rather in suitable sub-packages. For example, if those constitute defined subsets of the data, it makes sense to bundle.
  * Also note that for bundled and compressed formats, it is not easy to directly see the content of the archives (exception: content of ''*.tar'' which can be previewed in the [[rdms:webapp:databrowser|data browser of the web interface]]). For cases where the bundled, and potentially compressed, data set is still of a big size, it is recommended to **create a list of files/folders in the archive locally before bundling and then upload this with the bundled data set.** In these cases, the text file, which is much smaller than the data set, can be downloaded first, and it can be used to check if the respective data set contains the searched-for data. How these lists of files/folders are created depends on your system. Linux users can, for example, use the ''find'' or ''tree'' commands for that, while Windows users can achieve similar results via the ''dir'' command (Windows command prompt) or  ''Get-ChildItem'' (Windows PowerShell). 

Please contact [[rdms-support@rug.nl|rdms-support@rug.nl]] if you are not sure how to bundle/compress your data sets for long-term storage. 
===== Locked Files (HIERARCHY_ERROR) =====

In rare cases, data may arrive in an incomplete form in the RDMS. This usually happens if a data transfer is abruptly interrupted, for example, due to connection problems, without proper finalization. 

Restarting the data transfer may solve this issue. However, it is possible that the already transferred data remains in a locked state, causing problems when the transfer is restarted, as those files cannot be overwritten directly. 

If you experience these issues, it is recommended to contact [[rdms-support@rug.nl|RDMS-Support]].

Users of the command-line tool [[rdms:access:linux:icommands|iCommands]] have, furthermore, the possibility to detect such locked files directly using an appropriate CLI command. 

In general, these issues manifest in ''HIERARCHY_ERRORs'' when a data transfer to the RDMS (e.g. via ''iput'' or ''irsync'') is attempted via CLI. 

To check all files at an RDMS location ''/rug/home/path/to/folder'' including all its subfolders, and to detect just those files that are marked as locked, the following command can be executed:

<code>
 $ iquest "status: %s, name: %s/%s" "SELECT DATA_REPL_STATUS, COLL_NAME, DATA_NAME WHERE COLL_NAME LIKE '/rug/home/path/to/folder%' AND DATA_REPL_STATUS <> '1'"
</code>

This command will check the specified location for files which have a replica status of 2 ("read-locked") or 3 ("write-locked"), and then output it in the format:

<code>
status: <2/3>. name: <path_to_folder>/<name_of_file>
</code>

==== Removal of Locked Files ====

While the locked files cannot be directly removed, they can still be moved first to another location in your home/team location, for example, as a separate folder for locked files. Afterwards, the data transfer can be restarted. 

Best practices to handle locked files and resolve the ''HIERARCH_ERROR'' are:

  - Create a new folder in your home or team drive to contain all locked files.
  - Use the 'iquest' command to identify locked files and move them to the newly created location. CLI users can utilize 'imv' for this purpose.
  - Restart the data transfer. The ''HIERARCHY_ERROR'' should be resolved.
  - If you accumulated multiple locked files in your folder that you cannot delete, please contact [[rdms-support@rug.nl|RDMS-Support]], and we will help you remove these. 

**Note:** It is recommended not to contact RDMS support for every locked file, but instead first try to resolve it as described above. However, if numerous locked files are detected, you can directly contact RDMS support.