This is an old revision of the document!
Best Practices
This section contains a selection of best practices for the usage of the RDMS. Following these best practices will help to get the most optimal RDMS user experience.
The section will be gradually extended with new usage examples and tips.
If you feel that an important information should be added to this section, feel free to contact RDMS support with your request!
Data Naming
For the optimal usage of the RDMS, it is highly recommended to follow these best practices for naming your files/folders:
- Do not use special characters (e.g.
$%^*&#=!
) in your file/folder names. - Do not use periods (
.
) in your file/folder names. - Do not use quote symbols (
'
or“
) in your file/folder names. - Prefer usage of underscore (
_
) or hyphen (-
) instead of white spaces in file/folder names.
Example of a folder structure with good naming:
$ itree project_name project_name analytical_data machine_01 20231223_analysis.ext 20240111_analysis.ext 20240325_analysis.ext machine_02 20230222_analysis.ext 20230710_analysis.ext 20240109_analysis.ext machine_03 20231020_analysis.ext 20231120_analysis.ext 20231212_analysis.ext manuscripts publication_v01.odt
Example of a folder structure with wrong/difficult naming:
$ itree "Project with XXX and YYY" Project with XXX and YYY analytical_data analytical devices @ building 1 experiment 100% scan rate.ext experiment 74% scan rate.ext experiment 80% scan rate.ext analytical devices @ building 2 Experiment 01 by user.name@rug.nl.ext Experiment 02 by user.name@rug.nl.ext Experiment 03 by user.name@rug.nl.ext analytical devices @ building 3 $1-100%.ext Versuch #1.ext Versuch_Öldiffusion_erste_Möglichkeit.ext manuscripts publication.final version.odt
Bundling of Data Sets
To improve the performance of the RDMS, it is recommended to store data sets which contain a lot of single, smaller files in a structured format like *.tar
, *.tar.gz
, *.tar.bz
, or *.zip
.
These has the advantage that it improves the transfer rates for up-/ and download significantly which results from the fact, that the system only goes into multi-threaded transfers after a certain threshold of minimal file size (32 MB) is reached. The transfer of multiple, smaller files furthermore results in a big overhead which reduces the performance.
The best practice to handle such cases is therefore:
- Fist, collect all data locally.
- Before archival in the RDMS, bundle the data set or subsets of it into a structured data format (see above for formats).
- Upload the bundled format to the RDMS.
- (Add metadata if desired.)
- (Unbundle again in the RDMS.)
The last two steps, adding additional metadata as well as unbundling again on the RDMS side, are not mandatory.
For extraction on the RDMS side, CLI users can use the ibun -x
command as also described in the iCommands for (Meta)data Management section of this wiki.
Users of the RDMS web interface, can use the “Uncompress tar” function after right-click on a *.tar
file to extract it. Currently, this just works for *.tar
formats.
Detecting Corrupt Files
In rare cases, it can happen that data arrives in a non-finalized form in the RDMS. These usually happens if a data transfer suddenly drops out, for example due to connection problems, while the system did not finalize it properly.
Restarting the data transfer can solve this issue, but it can also happen that the already transferred data is kept in a locked state which results in problems when the transfer is restarted as those files cannot be overwritten directly.
If you experience these issues, it is normally recommended to contact RDMS-Support.
Users of the command-line tool `iCommands` have furthermore the possibility to detect such locked files directly via an appropriate CLI command.
In general, these issues manifest in HIERARCHY_ERRORs
when a data transfer to the RDMS (e.g. via iput
or irsync
) is tried via CLI.
To check all files at a RDMS location /rug/home/path/to/collection
including all its subfolders, and to detect just those files that are marked as locked, the following command can be executed:
$ iquest "status: %s, name: %s/%s" "SELECT DATA_REPL_STATUS, COLL_NAME, DATA_NAME WHERE COLL_NAME LIKE '/rug/home/path/to/folder%' AND DATA_REPL_STATUS > '1'"
These command will check the specified location for files which have a replica status of 2 (“read-locked”) or 3 (“write-locked”), and then output it in the format:
status: <2/3>. name: <path_to_folder>/<name_of_file>
While the locked files cannot be directly removed, they can still be moved first to another location in the your home/team location, for example a separate folder for locked files. Afterwards, the data transfer can be restarted.