Data Safety and Integrity

This is an old revision of the document!

Compared to other UG storage solutions, the RDMS archive, is unique as it provided you with different means to check the integrity of the stored data.

This section will explain the important concepts in the RDMS that relate to the save long-term storage of your data. It will also explain how you can check the integrity of your data yourself.

In short, the important concepts are:

Data Replication: Data in the RDMS is stored at two different physical locations. The versions at both physical locations are called replicas as the file is the same, meaning replicated, in both locations.
Checksum: A checksum is a certain value that is produced by running a checksum algorithm/function on certain data. The uniqueness of these values allow to check the integrity of the data.

As mentioned in the introduction, all data that is stored in the RDMS is replicated to two physical location. This is done automatically in the background. While the replication does not guarantee the integrity of the data, as also corrupted data, will get replicated, it is a safeguard mechanism in case of any harm to the data center. Due to the replication to two different physical location, the chances of both locations being affected is limited.

Note: The replication in the RDMS functions on a hardware level. For you as a user, this is not directly visible with the tools discussed in this wiki section. For example, the iCommands CLI can be used to check data integrity and, as will be described below, also shows the status of the replica, but will still just show one replica.

Here you will learn what steps you can take to check the integrity of your data in the RDMS yourself. The section will with explaining the use of data checksums in the RDMS, as well as describe the different replica statuses and what they mean. It will then describe how you can use this info to check your data, either using the RDMS web interface or using the iCommands CLI tool.

One of the unique features of the RDMS is that it is not a simple storage solution, but also that it has a database running in the background that can be used to annotate data with user-defined metadata, but which also is used to store other information about the data in the system.

In the case of the RDMS, we also store a checksum for every file that is stored in the RDMS. This is by default done automatically upon data ingestion via delayed rules/processes, but the calculation of checksums can also be enforced manually when using the iCommands.

The checksum of your data files can be checked via the already mentioned iCommands, but the information about the file checksum is also visible via the web interface.

The checksum that are stored in the RDMS are base64-encoded SHA256 checksums which is important to know when trying to reproduce the checksum in the RDMS locally (see below).

Note: If you use Windows, either via native WebDAV in MS File Explorer, Cyberduck, or WinSCP, the information about data checksums is not available. The same also applies for Mac users that use Cyberduck or Finder.

Data Safety and Integrity

Data Replication

Checking Data Integrity

Checksums in the RDMS

Data Replica Status Explained

Via the Web Interface

Via Command-Line Interface