Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
rdms:data:integrity [2025/03/19 14:00] – [How to Check Your File's Checksum**] burcurdms:data:integrity [2025/03/24 08:12] (current) – [Data Safety and Integrity] burcu
Line 7: Line 7:
  
   * **Data Replication**: Data in the RDMS is stored at two different physical locations. The versions at both physical locations are called replicas, as the file is identical, meaning replicated, in both locations.   * **Data Replication**: Data in the RDMS is stored at two different physical locations. The versions at both physical locations are called replicas, as the file is identical, meaning replicated, in both locations.
-  * **Checksum**: A checksum is a unique value that is generated by running a checksum algorithm/function on certain data. The uniqueness of these values allow to check the integrity of the data. +  * **Checksum**: A checksum is a unique value that is generated by running a checksum function on certain data. The uniqueness of these values allow to check the integrity of the data. 
  
 ===== Data Replication ===== ===== Data Replication =====
Line 25: Line 25:
 ==== Data Replica Status ==== ==== Data Replica Status ====
  
-Every file (not folder) in the RDMS also has a replica status associated with it. This replica status gets automatically assigned when the data enters the system. The replica status definitions result from iRODS the data management system that is the backbone of the RDMS. As of now, iRODS knows five different replica statuses of which four are used:+Every file (not folder) in the RDMS also has a replica status associated with it. This replica status gets automatically assigned when the data enters the system. The replica status definitions result from iRODS the data management system that is the backbone of the RDMS. As of now, iRODS knows five different replica statuse of which four are used:
  
 ^ Numeric Value     ^ Symbolic Value      ^ Name  ^ Definition ^ ^ Numeric Value     ^ Symbolic Value      ^ Name  ^ Definition ^
Line 59: Line 59:
  
 === Checking Integrity during Data Ingestion === === Checking Integrity during Data Ingestion ===
-The commands that are used for uploading data to the RDMS, namely ''iput'' and ''irsync'', both have an option to enforce checksum calculation and comparison via the additional ''-K'' flag. From the user documentation of both commands:+The commands that are used for uploading data to the RDMS, namely ''iput'' and ''irsync'', both have an option to enforce checksum calculation and comparison via the additional ''-K'' flag. From the user documentation of these commands:
  
 <code> <code>
Line 66: Line 66:
 </code> </code>
  
-Which will compute the checksums for you locally, but also on the RDMS side. In the process the checksums are verified by the iCommands for you and also directly stored in the iCAT catalog/database. +Which will compute the checksums for you locally, but also on the RDMS side. In the process the checksums are verified by the ''iCommands'' for you and also directly stored in the iCAT catalog/database. 
  
 **Note**: Even without using the ''-K'' flag, your uploaded data will get a checksum eventually due to the defined [[rdms:webapp:processes|delayed rules]], but using ''-K'' does that directly during data upload and also does the comparison for you.  **Note**: Even without using the ''-K'' flag, your uploaded data will get a checksum eventually due to the defined [[rdms:webapp:processes|delayed rules]], but using ''-K'' does that directly during data upload and also does the comparison for you. 
Line 99: Line 99:
 </code> </code>
  
-As can be seen, both checksums, the one registered in the RDMS as well as the one computed for the same file locally, are the sameTherefore, it can be guaranteed that the file in the RDMS is the same as the one that was uploaded to it+As can be seen, both checksums, the one registered in the RDMS and the one computed locally for the same file, are identicalThis confirms that the file stored in the RDMS matches the originally uploaded version
  
-As a further tip, it is also possible to adjust the command a little so that it does not just calculate the checksum for a single file, but for all files in a folder. An example command to do so (assuming Bash shell):+**Tip**: It is also possible to adjust the command a little so that it does not just calculate the checksum for a single file, but for all files in a folder. An example command to do so (assuming Bash shell):
  
 <code> <code>
Line 135: Line 135:
  
 <code> <code>
-[System.Convert]::ToBase64String((Get-FileHash -Algorithm SHA256 -Path "\path\to\example_file| Select-Object -ExpandProperty Hash | ForEach-Object { [System.Convert]::FromHexString($_) }))+[System.Convert]::ToBase64String((Get-FileHash -Algorithm SHA256 -Path "C:\path\to\fileForEach-Object { [byte[]]($_.Hash -split '(..)' -ne '' | ForEach-Object { [Convert]::ToByte($_, 16) }) }))
 </code> </code>
  
Line 142: Line 142:
 Get-ChildItem -Path "C:\path\to\folder" -File | ForEach-Object { Get-ChildItem -Path "C:\path\to\folder" -File | ForEach-Object {
     $file = $_.FullName     $file = $_.FullName
-    $checksum = [System.Convert]::ToBase64String((Get-FileHash -Algorithm SHA256 -Path $file | Select-Object -ExpandProperty Hash | ForEach-Object { [System.Convert]::FromHexString($_) }))+    $checksum = [System.Convert]::ToBase64String((Get-FileHash -Algorithm SHA256 -Path $file | ForEach-Object { [byte[]]($_.Hash -split '(..)' -ne '' | ForEach-Object { [Convert]::ToByte($_, 16) }) }))
     [PSCustomObject]@{     [PSCustomObject]@{
         FileName = $_.Name         FileName = $_.Name
Line 174: Line 174:
  
 **Notes**: **Notes**:
-  * While RDMS web interface does shows the checksum, you will still need to compute this value locally if you want to compare it. For that, please see the section above that described how to do this in different operating systems. +  * While RDMS web interface displayes the checksum, you will still need to compute this value locally if you want to compare it. For that, please see the section above that described how to do this in different operating systems. 
-  * As of now, the [[rdms:webapp:search|RDMS search]] does not allow to search for files with a specific checksum or for searching for all files with a certain replica status, for example to search for all non-good replica statuses in a certain RDMS location as can be done via iCommands. We are working on introducing this feature in the future. +  * Currently, the [[rdms:webapp:search|RDMS search]] does not support searching for files by a specific checksum or searching for all files with a specific replica status (e.g. finding all non-good replica statuses in a given RDMS location.We are working on introducing this feature in the future.