This is an old revision of the document!


Archiving Workflow

The archiving workflow in the RDMS allows the owner of a RDMS Project to archive the data contained in the project folder by following a step-by-step process in the web interface. An archive in the RDMS is a bundled dataset, called data package, that contains both data and related metadata, and has been frozen by making it read-only in the system. The archive is labelled with a creation date to inform the user of when the data was frozen. The archived dataset (data package) can then be pushed to the publication workflow (still in development), which will allow the publishing of the dataset metadata to the outside world, in compliance with the Open Science framework.

During the archiving process, there are three different roles that will be active at different times.

Owner/Admin: This role is responsible for assigning the data manager and metadata manager roles as well as starting the archiving process. By default, the creator of the RDMS project is its admin, but the role can also assigned to other users (see below for info about assigning roles). Best practice is to assign this role to the project supervisor.

Data Manager: This role is responsible for verifying that the data sent to the archive is complete and uncorrupted, and giving the final approval of the archive. Best practice is to assign this role to the person(s) who are most familiar with the data.

Metadata Manager: This role is responsible for verifying and completing the metadata information related to the archive. Best practice is to assign this role to the person(s) who know the origin and scope of the data.

A single user can have any number of these roles assigned to them, and/or multiple users can have the same or different role(s) and work at different stages of the archiving process. The important part is that each role is assigned to at least one user, otherwise the workflow cannot be completed.

This section will explain the workflow starting from an already existing RDMS example project and walk you through the requirements to start the workflow, the different steps, and the roles active at each step. It will also elaborate more on the content of the created data package.

Existing Project

To start an archiving workflow, the first prerequisite is that the RDMS Project you want to archive must already exist. The project also needs to contain data. An empty project will result in an error after the first step of the workflow.

Using the Web Interface

The archiving workflow requires using the RDMS web interface. It is not possible to execute the workflow via CLI, e.g. iCommands.

Correct User Privileges

If you want to start an archiving workflow as a project admin, you need the correct, elevated permissions to start the workflow and be able to assign user roles (data manager and metadata manager). If you lack these permissions, please contact rdms-support@rug.nl. The easiest way to check if you have the correct permissions is to check if you can assign roles to users in the project management tab.

For the other involved roles, metadata and data manager, no special permissions are needed, but they should have at least read/write permission in the project. If this is not the case, the workflow does not allow them to modify or approve the data (metadata).

Assigning Roles

If you know that you have the right permissions, then we recommend that you assign the desired workflow roles for the RDMS Project before starting an archiving workflow. You can do this, as the owner of the project, via the data management tab.

By clicking on the pencil symbol next to the name of an existing project member, their project permissions as well as project roles can be adjusted (see below for best practices).

After the roles are assigned, the archiving workflow can either start with the initialization of a new workflow by the project admin or continue from where it left off before the required roles were assigned.

Notes:

  • To assign a user as project admin, select the 'own' permission. Please note that the user needs elevated privileges (having 'own' is not enough) to be able to act as project admin. In cases where this is needed, please contact rdms-supprt@rug.nl.
  • To assign a user role, the user needs to have at least 'read' permission in the project.
  • The section about best practices gives useful information on how these roles could be assigned in a smart way.

Click to display ⇲

Click to hide ⇱

Active role: Project Admin
Via sidebar item: Data Management tab or Workflows tab.

The first step after all prerequisites are fulfilled is the initialization of the workflow.

The project admin can do this via two places in the RDMS, either via the data management or via the workflows tab.

To initialize via the data management tab, open the respective project in the tab, then open the menu in the top-right corner (cogwheel symbol) and select Archive data.

A new pop-up window will open where the project data from which the project should be initialized can be selected.

After the data was selected, the arrow button in the top-left corner will redirect to another page that shows an overview of the selected data. Moreover, it allows to specify a version tag for the created archiving workflow. The default tag is of the format archive<Timestamp> with the timestamp being the Unix time when the archiving workflow was initialized, but it can be customized by the user.

To initialize via the workflows page, open the than, then select “Archiving” from the menu. Using the cogwheel button in the top-right corner, the “Archive new data” option can be used to initialize a new archiving workflow. This will open a new pop-up where you can specify the project from which you want to initialize and let's you select the data an assign a version tag.

Notes:

  • The version tag can't be adjusted by the user afterwards. Please take this into account when adjusting this value.
  • While the project admin can already select data during initialization, more data can be added or removed in the next step. This can be done by the data manager.

Special considerations:

When you start the archiving process, you will be prompted to select the folders or files you want to archive. In this step, you can decide if you want to archive the entire project folder or just a part of it. Most of the time, you will select the entire folder. However, there are cases where part of the archive needs to be deleted before the customary retention period (10 years), due to privacy regulations. In such cases, we advise you to create two archives: one containing normal data that should be stored for 10 years, the other containing the sensitive data that needs to be deleted earlier. A good practice would be to label both archives in a way that makes it clear that they are interlinked and which one contains the sensitive data. This is best done in the project folder before the archiving starts. Please contact rdms-support@rug.nl if your data belongs to these special cases and you are unsure how to use the archiving workflow in such cases.

Click to display ⇲

Click to hide ⇱

Active role: Data Manager
Via sidebar item: Workflows tab.

In this step, the data manager checks if the data, that is contained in the project folder and that is sent to the archiving stage, is complete and uncorrupted. If the data manager confirms that all is good, the data is copied to a separate folder in the projects's archive located at /rug/home/DataArchive/Projects/<projectname>/.

The data manager can do this steps via the workflows tab in the web interface where the available archiving workflows are listed.

As the workflow was just started by the project admin, the newly created workflow is found on the first stage called “Prepare data”. The respective workflow can be identified by the version tag that was assigned by the project admin.

At the top of the cards of the respective workflow, the button with the three vertical dots can be used to reveal the menu that allows the data manager to execute different tasks on the selected workflow.

The “Prepare data” function opens a view of the currently selected data. It also allows to select if RDMS metadata, meaning metadata that was added to files/folders that are included in the archive, should be included in the final archive (see below for a description of the final archived package).

The “Append data” option allows the data manager to add more data that was not included by the project admin during initialization.

If all was checked and missing data was added, the “Copy data to archive” function can be used to get once more an overview of all the selected data, and then to copy the data from the project space to the projects' data archive.

During the copying of the data, the archiving workflow is blocked which can be clearly seen by visual indicator. After the copying is finished, the workflow can continue with the following step, the creation of the data package.

Click to display ⇲

Click to hide ⇱

Active role: Data Manager
Via sidebar item: Workflows tab.

In this step, the previously unstructured data that was moved to the project's archive space in the RDMS is bundles to a so called data package. This data package is a tar file containing the selected data as well as RDMS metadata that was available if the option to export metadata was selected (color #ed1c24>see below for a description of the final archived package</color>)

This is again done via the workflows tab by the data manager. The options on this step of the workflow are previewing of the data if an additional check is needed and creation of the data package.

If the creation of the data package is selected, the system will automatically bundle the previously unstructured data into a tar archive. Afterwards, the next step (adding metadata to data package) can follow.

Notes: It is not possible to add more data via the workflows page in this step. If the data manager sees that wrong data was selected, the “Prepare data” function can be used which puts the workflow to the previous step, preparation of the data, where the content can be adjusted.

Click to display ⇲

Click to hide ⇱

Active role: (Meta)data Manager

After the data package was successfully created, metadata can be added to the data package. Also, if it exists, a DOI of an existing, related publications can be added to the data package.

While the RDMS in general allows to add metadata with or without a metadata template, the archiving workflow only allows do add metadata via templates. This is done to help standardizing the metadata for archives projects and therefore make it better findable. Templates can be created by users and also shared with others. If there is no suitable metadata template present, you will therefore have to create one as described in the Metadata Template section of the wiki.

In this step, a metadata template can be selected and metadata can be added. This can be done by either the data manager or the metadata manager.

Also, a DOI can be added if there exists a publication to which the data is related. The DOI will be verified to be valid in this step. Please note the correct format of the DOI to be specified (DOI format: prefix/suffix, not URL).

Now, the metadata role becomes active and the metadata manager has to check and approve the metadata. Note that a link was automatically added as metadata entry for the DOI.

If all looks good, the last step of the workflow can follow.

Click to display ⇲

Click to hide ⇱

Active role: Data Manager
Via sidebar item: Workflows tab.

At this stage, nearly all is set, but a last confirmation is needed by the data manager to finish the workflow. In this step, which is again done via the workflows tab.

The data manager can either put the workflow back to the add/confirm metadata stage or use the “Archive” button to finish the workflow.

This will present once more all the info about the archived data package, its final destination as well all metadata that will be added.

If the data manager agrees with all that, using the “Archive” button will finalize the workflow.

The data package can now be found at its final destination including all the metadata that was added during the workflow.

At the end of the archiving workflow, a so called data package is created. This is the name in the RDMS for a data set with a specific structure that resulted from the archiving workflow.

In this section, we will have a more detailed look at the data package and explain its internal structure.

In general, the following applies:

  • The created data package is always in a structured *.tar format which is a standard format for bundling data that can be opened with different tools.
  • Inside the tar, there are different subfolders for the selected and archived data as well as the information about the metadata that was available for the included data in a separate folder in *.json format. The second folder with the metadata info is only created if it was selected during the archiving workflow (step 2) that RDMS metadata should be included in the created data package. Otherwise this folder does not exist.

In our case where we selected metadata to be included and where we had only one data folder selected to be included in the archive, our archive has in the end the following structure after being downloaded and extracted locally:

# This is the general structure of the created datapackge after being extracted

archive1740649820/                            # This is the name (version tag) of the archive that we specified during the workflow
├── 2025_2_27_10_51_11_889000000              # Subfolder that contains the selected (meta)data
│   └── Some_project_data                     # This is the folder from which we started the workflow. Below it is content (not completely shown)
│       └── LA-187-1
└── RUGRDMS_METADATA                          # As we selected in the example to have metadata included, we get this folder as well
    └── 1Some_project_data.metadata.json      # This is the available metadata for the "Some_project_data" folder in json format.

If we have a look at the json file with the metadata, we see that it contains info about the metadata that was available for the selected data. The following is a snippet of that file that shows how this info is exported and included in the data package.

[
  {
    "l_header": "# DO NOT EDIT. Automatically generated for archiving.",
    "l_className": "rugirodsrest.RugIRODSRestArchiveMetaToStore",
    "l_toplevel_path": "/devrugZone/home/Projects/Example_Project_1/Some_project_data",
    "l_objectType": "NORMAL",
    "l_objectFullPath": "/devrugZone/home/Projects/Example_Project_1/Some_project_data",
    "l_symlink_destination": "",
    "l_metaDataList": [
      {
        "metadataDomain": "COLLECTION",
        "domainObjectId": "619037",
        "domainObjectUniqueName": "/devrugZone/home/Projects/Example_Project_1/Some_project_data",
        "avuId": 620497,
        "size": 0,
        "createdAt": "Feb 26, 2025 3:28:02 PM",
        "modifiedAt": "Feb 26, 2025 4:27:06 PM",
        "avuAttribute": "Origin",
        "avuValue": "RDMS",
        "avuUnit": "",
        "count": 1,
        "lastResult": true,
        "totalRecords": 0
      },
      {
        "metadataDomain": "COLLECTION",
        "domainObjectId": "619037",
        "domainObjectUniqueName": "/devrugZone/home/Projects/Example_Project_1/Some_project_data",
        "avuId": 290732,
        "size": 0,
        "createdAt": "Feb 26, 2025 3:28:02 PM",
        "modifiedAt": "Feb 26, 2025 4:27:06 PM",
        "avuAttribute": "Type",
        "avuValue": "Testing",
        "avuUnit": "",
        "count": 2,
        "lastResult": true,
        "totalRecords": 0
      }
    ]
  },
[...]

This section explains our suggestions on how you can set up the roles in the project to more efficiently spread the tasks of the workflow among project participants. It will also give some more information about best practices in the specific context of the project archiving workflow.

In general, this is how the roles could be assigned in a project:

  • Project Admin: This role should be taken by the project lead. This is the person that manages the project (permissions, roles, etc.) and is also the only one that can start the workflow. Other than that, this role does not need to take additional steps in the workflow.
  • Data Manager: As this role verifies that all data that should be archived is included during the workflow, it makes sense to assign this role to the person that is most familiar with the data. In the case of a simple research project that could be the main researcher that produced that data. In cases where multiple people are familiar with different parts of the data, we recommend assigning the data manager role to each person, so they can individually verify the integrity of their part of the data. Since it is possible to assign multiple data managers, we also suggest to discuss beforehand who will have final say in the workflow, as only one approval is needed to move to the next step. The data manager can also add metadata in the following step of the workflow, but cannot approve the metadata.
  • Metadata Manager: The main role of the metadata manager is to confirm that the metadata associated to the archive is correct and complete. We suggest assigning this role to a person in the project that has knowledge of the data, but that has not been involved in previous workflow steps. If no such person exists, then a data manager is also suited for this role. If you had multiple data managers in previous steps, we suggest appointing a data manager that did not have final say over the data set to this role. If your project/research group has staff that takes care of the data management, we suggest assigning them to metadata manager.

As already mentioned above, multiple roles can be assigned to the same user. If a user is both data and metadata manager, then the whole workflow, except the initialization, can be done by that single user. This is also a valid possibility, but we suggest you make use of the “checks and balances” that the archiving workflow introduces by assigning roles to different users, where possible.