Differences

This shows you the differences between two versions of the page.

--- rdms:workflows:archiving [2025/03/03 13:04] – [Step 3: Creation of the Data Package] GR edit giulio
+++ rdms:workflows:archiving [2025/03/06 09:19] (current) – unstructured to unbundled jelte
@@ Line 1: / Line 1: @@
 {{indexmenu_n>2}}
 ====== Archiving Workflow ======
-{{ :rdms:workflows:rdms_archiving_workflow_wiki.png?900 |}}
+{{ :rdms:workflows:rdms_archiving_workflow_wiki.svg |}}
-The archiving workflow in the RDMS allows the owner of a [[rdms:solution:projects|RDMS Project]] to archive the data contained in the project folder by following a step-by-step process in the web interface. An archive in the RDMS is **a bundled dataset**, called //data package//, that contains both data and related metadata, and has been **frozen by making it read-only** in the system. The archive is **labelled with a creation date** to inform the user of when the data was frozen. The archived dataset (data package) can then be pushed to the publication workflow (**still in development**), which will allow the publishing of the dataset metadata to the outside world, in compliance with the Open Science framework.
+The archiving workflow in the RDMS allows the owner of a [[rdms:solution:projects|RDMS Project]] to archive the data contained in the project folder by following a step-by-step process in the web interface. An archive in the RDMS is **a bundled dataset**, called //data package//, that contains both data and related metadata, and has been **frozen by making it read-only** in the system. The archive is by default **labelled with a creation date** to inform the user of when the data was frozen. The archived dataset (data package) can then be pushed to the publication workflow (**still in development**), which will allow the publishing of the dataset metadata to the outside world, in compliance with the Open Science framework.
 During the archiving process, there are **three different roles** that will be active at different times.
@@ Line 69: / Line 69: @@
 {{ :rdms:workflows:rdms_workflow_init_1.png?direct&600 |}}
-You will be redirected to another window that shows an overview of the selected data. The window also allows to specify a version tag for the created archiving workflow. The default tag is of the format ''archive<Timestamp>'' with the timestamp being the [[https://en.wikipedia.org/wiki/Unix_time|Unix time]] when the archiving workflow was initialized, but it can be customized by the user.\\ **Note**: We advise to keep the timestamp as is and in some form in the name of the archive, should you choose to modify it.
+You will be redirected to another window that shows an overview of the selected data. The window also allows to specify a version tag for the created archiving workflow. The default tag is of the format ''archive<Timestamp>'' with the timestamp being the [[https://en.wikipedia.org/wiki/Unix_time|Unix time]] when the archiving workflow was initialized, but it can be customized by the user.
 {{ :rdms:workflows:rdms_workflow_init_4.png?direct&600 |}}
@@ Line 84: / Line 84: @@
 **Notes:**
   * The version tag can't be adjusted by the user afterwards. Please take this into account when adjusting this value.
+  * If you adjust the archive name, it is still recommended to keep a timestamp of some sorts ([[rdms:bestpractices|usual naming recommendations]] apply).
   * While the project admin can already select data during initialization, more data can be added or removed in the next step. This can be done by the data manager.
@@ Line 101: / Line 102: @@
 ----
-As data manager, you can do this step via the workflows tab in the web interface, where the available archive drafts are listed in the archiving workflow page. After the project admin initializes the workflow, you can find the newly created archive draft in the first column, labelled "Prepare data". You can identify the respective archive draft by the version tag that was assigned by the project admin in the previous step.
+As data manager, you can do this step via the workflows tab in the web interface, where the available archive drafts are listed in the archiving workflow page. After the project admin initializes the workflow, you can find the newly created archive draft in the first column, labelled "Prepare data". The drafts are organized into cards and you can identify the respective archive draft by the version tag that was assigned by the project admin in the previous step. At the top of each card, you can find a button with three vertical dots which you can use to execute different tasks on the selected workflow. See the screenshot below for the location of the button and the options available to you.
-The drafts are organized into cards, at the top of which you can find a button with three vertical dots. You can use this button to reveal the menu that allows the data manager to execute different tasks on the selected workflow. See the screenshot below for the location of the button and the options available to you.
 {{ :rdms:workflows:rdms_workflow_dataprep_1.png?direct&600 |}}
-If you click on the //Prepare data// option, a view of the currently selected data will open in a new window. In this window, you can verify that the data that needs to be archived is correct and complete, but you can also select an option that will allow you to add [[rdms:metadata:|RDMS metadata]] to the archive. What we mean here is that you will be adding metadata that was added to files and folders included in the archive, not that you are adding metadata **about** the archive. This will happen in a later step.
+If you select the //Append data// option, you will be able to add data to the archive. Selecting this option will also open a new window, where you will be guided through adding data. Use this option if the project admin did not add all the data necessary to the archive at the previous step.\\
+After all data was added, you can click on the //Prepare data// option to get a view of the currently selected data in a new window. In this window, you can verify that the data that needs to be archived is correct and complete, but you can also remove data again. Additionally, you can select an option that will allow you to add [[rdms:metadata:|RDMS metadata]] to the archive. What we mean here is that you will be adding metadata that was added to files and folders included in the archive, not that you are adding metadata **about** the archive. This will happen in a later step.
 {{ :rdms:workflows:rdms_workflow_dataprep_2.png?direct&600 |}}
-If you select the //Append data// option, you will be able to add data to the archive. Selecting this option will also open a new window, where you will be guided through adding data. Use this option if the project admin did not add all the data necessary to the archive at the previous step. You can also remove data here, should you find that unnecessary data was added during the initialization (see previous steps for screenshots of the process).\\
 \\
-Finally, once you are ready to package the data, click on the //Copy data to archive// option to move the archive draft to the next step. A window will open, where you can verify the data sent to archive once again. If you decide to approve the data in this window, then the archiving workflow will start copying your data from the project space to the projects' data archive.
+\\
+Finally, once you are ready, click on the //Copy data to archive// option to move the archive draft to the next step. A window will open, where you can verify the data sent to archive once again. If you decide to approve the data in this window, then the archiving workflow will start copying your data from the project space to the projects' data archive.
 {{ :rdms:workflows:rdms_workflow_dataprep_3.png?direct&600 |}}
@@ Line 128: / Line 129: @@
 **Prerequisites**: Step 2 has finished.
-In this step, the previously unstructured data that was moved to the project's archive space in the RDMS is bundled to a so called **data package**. This data package is a tar file containing the selected data, as well as RDMS file and folder metadata if the option to export it was selected in the previous step.
+In this step, the previously unbundled data that was moved to the project's archive space in the RDMS is bundled to a so called **data package**. This data package is a tar file containing the selected data, as well as RDMS file and folder metadata if the option to export it was selected in the previous step.
 ----
@@ Line 139: / Line 140: @@
 {{ :rdms:workflows:rdms_workflow_dp_1.png?direct&600 |}}
-If you select the //Create data package// option in the menu, the RDMS system will automatically bundle the previously unstructured data into a tar archive. Afterwards, the next step (adding metadata to data package) can follow.
+If you select the //Create data package// option in the menu, the RDMS system will automatically bundle the previously unbundled data into a tar archive. Afterwards, the next step (adding metadata to data package) can follow.
 {{ :rdms:workflows:rdms_workflow_dp_4.png?direct&600 |}}
@@ Line 148: / Line 149: @@
 ==== Step 4: Add/Approve Metadata ====
 <hidden>
-**Active role**: (Meta)data Manager
+**Active role**: Metadata Manager and Data Manager\\
+**Prerequisites**: Step 3 is finished, (a DOI for the data set exists).
-After the data package was successfully created, metadata can be added to the data package. Also, if it exists, a [[https://en.wikipedia.org/wiki/Digital_object_identifier|DOI]] of an existing, related publications can be added to the data package.
+In this step, metadata about the archive can be added to the data package, after the data package was successfully created. Also, **if it already exists**, a [[https://en.wikipedia.org/wiki/Digital_object_identifier|DOI]] of a related publications can be added to the data package.
-While the RDMS in general allows to add metadata with or without a metadata template, the archiving workflow only allows do add metadata via templates. This is done to help standardizing the metadata for archives projects and therefore make it better findable. Templates can be created by users and also shared with others. If there is no suitable metadata template present, you will therefore have to create one as described in the [[rdms:metadata:metadatatemplates|Metadata Template section of the wiki]].
+----
+While the RDMS in general allows the user to add metadata with or without a metadata template, the archiving workflow only allows to add metadata via templates. This is done to help standardize the metadata for archived projects and therefore make them better findable. Templates can be created by users and also shared with others. If there is no suitable metadata template present, you will therefore have to create one, as described in the [[rdms:metadata:metadatatemplates|Metadata Template section of the wiki]]. Nevertheless, please remember that you are adding metadata about the archive during this step, not about the single files and folders within it. As such, you might not need too much complexity when it come to the metadata template you want to use.\\
+\\
+As in previous steps, the three dots menu holds all the actions you can perform at this stage. They are, in order, //Add DOI//, //Add metadata template//, //Approve metadata//, and //Data package//. If you are data manager, you can move the archive draft back to the previous step. We do not expect you to have to do it, but last minutes changes to a data set could still happen. This is why you still have the option to edit the data.
 {{ :rdms:workflows:rdms_workflow_meta_3.png?direct&600 |}}
-In this step, a metadata template can be selected and metadata can be added. This can be done by either the data manager or the metadata manager.
+If you select //Add metadata template//, you will see a new window open. At the very top of the window, you can choose which template you want to fill in. Then you can select or type in the different metadata entries the template requires you to add. This step allows for both metadata, as well as data, managers to add metadata.
 {{ :rdms:workflows:rdms_workflow_meta_2.png?direct&600 |}}
-Also, a DOI can be added if there exists a publication to which the data is related. The DOI will be verified to be valid in this step. Please note the correct format of the DOI to be specified (DOI format: ''prefix/suffix'', **not** URL).
+If you already have generated a DOI for the dataset, then you can use the //Add DOI// menu button to insert the existing DOI into the metadata of the archive. You can also add a DOI linked to a related publication in this stage of the archiving workflow. The RDMS will check the DOI and will verify its validity. Please note that the correct format of the DOI to be specified is ''prefix/suffix'', **not** URL.
 {{ :rdms:workflows:rdms_workflow_meta_1.png?direct&600 |}}
-Now, the metadata role becomes active and the metadata manager has to check and approve the metadata. Note that a link was automatically added as metadata entry for the DOI.
+The last option in the menu we have not yet addressed is //Approve metadata//. This action is **available only to the metadata manager**. If you or other metadata managers have checked that the metadata has been filled in properly, then you can press the button //Approve metadata// to move the archive draft to the final stage of the archiving workflow. Note that a DOI link was automatically added as metadata entry, if a DOI was specified.
 {{ :rdms:workflows:rdms_workflow_meta_4.png?direct&600 |}}
-If all looks good, the last step of the workflow can follow.
 </hidden>
@@ Line 174: / Line 179: @@
 <hidden>
 **Active role**: Data Manager\\
-**Via sidebar item**: [[rdms:workflows:start|Workflows tab]].
+**Prerequistes**: Step 4 is finished, data and metadata are complete.
-At this stage, nearly all is set, but a last confirmation is needed by the data manager to finish the workflow. In this step, which is again done via the workflows tab.
-The data manager can either put the workflow back to the add/confirm metadata stage or use the "Archive" button to finish the workflow.
+In this step, a last confirmation is needed by the data manager to finish the workflow, if everything has been set up properly. This is the last step of the workflow and a **point of no return**.
-{{:rdms:workflows:rdms_archiving_appove_1.png?direct&800|}}
+----
-This will present once more all the info about the archived data package, its final destination as well all metadata that will be added.
+At this stage of the archiving workflow, you have two options left as the data manager. You can either push the archive draft back to the add/confirm metadata stage by pressing //Metadata//, because some things are still missing, or press the //Archive// button to create the final archive and finish the workflow.
-{{:rdms:workflows:rdms_archiving_appove_2.png?direct&800|}}
+{{ :rdms:workflows:rdms_archiving_appove_1.png?direct&800 |}}
-If the data manager agrees with all that, using the "Archive" button will finalize the workflow.
+Pressing //Archive// will open a new window one last time, presenting all the info about the archived data package, its final destination, as well all metadata that will be added tot he archive. If everything looks good, you can use the //Archive// button to finalize the workflow.
-The data package can now be found at its final destination including all the metadata that was added during the workflow.
+{{ :rdms:workflows:rdms_archiving_appove_2.png?direct&800 |}}
+When the operation is finished, you can find the data package in its final destination, as shown in the screenshot below. The archive will contain all the data added during the workflow, as well as all the metadata. For an explanation of the structure of the archive, please look to the next section.
+{{ :rdms:workflows:rdms_archiving_approve_3.png?direct&800 |}}
-{{:rdms:workflows:rdms_archiving_approve_3.png?direct&800|}}
 </hidden>
 ===== The Data Package and its Content =====
-At the end of the archiving workflow, a so called data package is created. This is the name in the RDMS for a data set with a specific structure that resulted from the archiving workflow.
+At the end of the archiving workflow, you will have created a **data package**. In the RDMS, we use this term to identify a data set with a specific structure that resulted from the archiving workflow. In this section, we will have a more detailed look at the data package and explain its internal structure.
-In this section, we will have a more detailed look at the data package and explain its internal structure.
 In general, the following applies:
   * The created data package is always in a structured ''*.tar'' format which is a standard format for bundling data that can be opened with different tools.
-  * Inside the tar, there are different subfolders for the selected and archived data as well as the information about the metadata that was available for the included data in a separate folder in ''*.json'' format. The second folder with the metadata info is only created if it was selected during the archiving workflow (step 2) that RDMS metadata should be included in the created data package. Otherwise this folder does not exist.
+  * Inside the tar, there are different subfolders for the selected and archived data, as well as the information about the metadata on files and folders included in the archive, saved in ''*.json'' format. This second folder with the metadata info is only created if you selected to include metadata during step 2 of the archiving workflow. Otherwise, you will only see the folder containing the data.
-In our case where we selected metadata to be included and where we had only one data folder selected to be included in the archive, our archive has in the end the following structure after being downloaded and extracted locally:
+In our example case, we selected metadata to be included and one folder containing the data. Thus, our archive has the following structure in the end:
 <code>
-# This is the general structure of the created datapackge after being extracted
+# This is the general structure of the created data package after being extracted.
 archive1740649820/                            # This is the name (version tag) of the archive that we specified during the workflow
 ├── 2025_2_27_10_51_11_889000000              # Subfolder that contains the selected (meta)data
-│   └── Some_project_data                     # This is the folder from which we started the workflow. Below it is content (not completely shown)
+│   └── Some_project_data                     # This is the folder from which we started the workflow. Below is its content (not completely
-│       └── LA-187-1
+│       └── LA-187-1                          # shown)
 └── RUGRDMS_METADATA                          # As we selected in the example to have metadata included, we get this folder as well
-    └── 1Some_project_data.metadata.json      # This is the available metadata for the "Some_project_data" folder in json format.
+    └── 1Some_project_data.metadata.json      # This is the available metadata for the "Some_project_data" folder in .json format.
 </code>
-If we have a look at the json file with the metadata, we see that it contains info about the metadata that was available for the selected data. The following is a snippet of that file that shows how this info is exported and included in the data package.
+If we have a look at the JSON file with the metadata, we see that it contains info about the metadata related to the selected data, not the one related to the archive. The following is a snippet of that file that shows how this info is exported and included in the data package.
 <code>