| Both sides previous revision Previous revision Next revision | Previous revision |
| dcc:pdpsol:publishinghsd [2026/03/30 08:30] – marlon | dcc:pdpsol:publishinghsd [2026/04/13 11:22] (current) – [Example dataset: Corpus PINO] change data sharing to data transfer agreement marlon |
|---|
| {{indexmenu_n>5}} | {{indexmenu_n>5}} |
| ====== Archiving and Publishing Human Subject Data ====== | ===== Archiving and Publishing Human Subject Data ===== |
| |
| ===== Introduction ===== | ==== Introduction ==== |
| Your project nears its completion. It is time to prepare your data for archiving and publishing in accordance with [[https://www.rug.nl/digital-competence-centre/research-data/fair-data-open-science|the FAIR principles]], to make your data //as open "as possible and as closed as necessary"//. When research involves human participants, there is a tension between protecting the privacy of your participants and meeting expectations to archive and publish data so others can verify and reuse your work. Navigating this playing field requires careful planning and thoughtful decisions, putting safeguards in place that protect participants, while still allowing responsible access for future research. You can use the sections below to guide you in this process. | Your project nears its completion. It is time to prepare your data for archiving and publishing in accordance with [[https://www.rug.nl/digital-competence-centre/research-data/fair-data-open-science|the FAIR principles]], to make your data //as open "as possible and as closed as necessary"//. When research involves human participants, there is a tension between protecting the privacy of your participants and meeting expectations to archive and publish data so others can verify and reuse your work. Navigating this playing field requires careful planning and thoughtful decisions, putting safeguards in place that protect participants while still allowing responsible access for future research. You can use the sections below to guide you in this process. |
| | |
| ===== What needs to be archived and what can be published? ===== | ==== What needs to be archived and what can be published? ==== |
| |
| Check whether you can select data, with two goals of archiving in mind: | Check whether you can select data, with two goals of archiving in mind: |
| * Select and organize the data and other materials that are potentially valuable for further research by you, your team, or fellow researchers. | * Select and organize the data and other materials that are potentially valuable for further research by you, your team, or fellow researchers. |
| |
| ===== De-identifying data before archiving or publishing ==== | ==== De-identifying data before archiving or publishing ==== |
| Often it is not necessary to keep all collected data for the purpose of validating your findings or for researchers to reuse your data. | Often, it is not necessary to keep all collected data for the purpose of validating your findings or for researchers to reuse your data. |
| * Limit the (personal) data and materials you archive to the ones that you need for verification of your research. Follow the procedures in the [[datadesctruction|destruction protocol(s)]] that you designed. Add these protocol(s) to your data package, publication package or archive. (e.g. anonymised consent forms can be archived, while consent forms containing personal data should be de-identified or destroyed in accordance with the UG protocol) | * Limit the (personal) data and materials you archive to the ones that you need for verification of your research. Follow the procedures in the [[datadesctruction|destruction protocol(s)]] that you designed. Add these protocol(s) to your data package, publication package or archive. (e.g. anonymised consent forms can be archived, while consent forms containing personal data should be de-identified or destroyed in accordance with the UG protocol) |
| * Determine whether it is possible to [[de-identification|de-identify]] before publishing, while also keeping in mind the usability of your dataset. | * Determine whether it is possible to [[de-identification|de-identify]] before publishing, while also keeping in mind the usability of your dataset. |
| |
| ===== Publishing de-identified or anonymized data ===== | ==== Publishing de-identified or anonymized data ==== |
| FAIR data does not necessarily mean that all your data and materials need to openly available. Even after de-identification, there can be [[https://www.rug.nl/digital-competence-centre/research-data/archive-and-publish/make-your-data-available-under-restricted-access|good reasons to restrict access to your data]]. The objective is to have data as open as possible, and as closed and protected as necessary. | FAIR data does not necessarily mean that all your data and materials need to be openly available. Even after de-identification, there can be [[https://www.rug.nl/digital-competence-centre/research-data/archive-and-publish/make-your-data-available-under-restricted-access|good reasons to restrict access to your data]]. The objective is to have data as open as possible, and as closed and protected as necessary. |
| |
| Consider applying a **‘layered’ approach** to your (de-identified) files by scoring your files in terms of sensitivity. | Consider applying a **‘layered’ approach** to your (de-identified) files by scoring your files in terms of sensitivity. |
| |
| ====Level 1: contains no personal data ==== | === Level 1: contains no personal data === |
| Publish your [[de-identification|(anonymized)]] dataset and supporting materials in a recognized data repository such as [[https://www.rug.nl/digital-competence-centre/research-data/archive-and-publish/dataversenl|DataverseNL]], on the condition that __**no**__ other [[https://www.rug.nl/digital-competence-centre/research-data/archive-and-publish/make-your-data-available-under-restricted-access|reasons for restricting access]] apply. Allow for reuse by adding a license (for instance, [[https://www.rug.nl/library/open-access/how-to-publish-open-access/creative-commons-licenses|a Creative Commons license]]) and use the persistent identifier (e.g., [[https://www.rug.nl/library/publish/isbn-doi|DOI]]) for data citation. | Publish your [[de-identification|(anonymized)]] dataset and supporting materials in a recognized data repository such as [[https://www.rug.nl/digital-competence-centre/research-data/archive-and-publish/dataversenl|DataverseNL]], on the condition that __**no**__ other [[https://www.rug.nl/digital-competence-centre/research-data/archive-and-publish/make-your-data-available-under-restricted-access|reasons for restricting access]] apply. Allow for reuse by adding a license (for instance, [[https://www.rug.nl/library/open-access/how-to-publish-open-access/creative-commons-licenses|a Creative Commons license]]) and use the persistent identifier (e.g., [[https://www.rug.nl/library/publish/isbn-doi|DOI]]) for data citation. |
| |
| ====Level 2: contains personal data in de-identified form (not anonymized)==== | === Level 2: contains personal data in de-identified form (not anonymized) === |
| Publish your [[de-identification|de-identified dataset]] and supporting materials on [[https://www.rug.nl/digital-competence-centre/research-data/archive-and-publish/dataversenl|DataverseNL]], under restricted access. Determine the terms of use for external parties that would like to reuse your data. Make sure that these terms of access align with the informed consent. | |
| | Publish your [[de-identification|de-identified dataset]] and supporting materials on [[https://www.rug.nl/digital-competence-centre/research-data/archive-and-publish/dataversenl|DataverseNL]], under restricted access. Determine the terms of use for external parties that would like to reuse your data. [[https://www.rug.nl/library/open-access/how-to-publish-open-access/creative-commons-licenses|Creative Commons licenses]] are not suitable for data containing personal data with access restrictions. Make sure that these terms of use align with the informed consent. |
| | |
| | === Level 3: contains sensitive personal data === |
| |
| ====Level 3: contains sensitive personal data ==== | |
| When your data still contains highly sensitive information, do not publish this data openly or with access controls in a data repository. Instead, [[https://www.rug.nl/digital-competence-centre/research-data/archive-and-publish/|archive your data]] in accordance with the [[https://www.rug.nl/digital-competence-centre/research-data/policies|research data policy of your faculty or institute]]. The [[https://www.rug.nl/digital-competence-centre/contact/|UG DCC]] can assist in developing a procedure for making these sensitive data available for reuse under well-defined conditions. Make sure that these conditions are in line with the informed consent. | When your data still contains highly sensitive information, do not publish this data openly or with access controls in a data repository. Instead, [[https://www.rug.nl/digital-competence-centre/research-data/archive-and-publish/|archive your data]] in accordance with the [[https://www.rug.nl/digital-competence-centre/research-data/policies|research data policy of your faculty or institute]]. The [[https://www.rug.nl/digital-competence-centre/contact/|UG DCC]] can assist in developing a procedure for making these sensitive data available for reuse under well-defined conditions. Make sure that these conditions are in line with the informed consent. |
| |
| ---- | ---- |
| If your dataset contains sensitive personal data, you can still publish the supporting materials on [[https://www.rug.nl/digital-competence-centre/research-data/archive-and-publish/dataversenl|DataverseNL]]. Via your DataverseNL page, you can also inform researchers that want to reuse your data about the procedure to request the data. | If your dataset contains sensitive personal data, you can still publish the supporting materials on [[https://www.rug.nl/digital-competence-centre/research-data/archive-and-publish/dataversenl|DataverseNL]]. Via your DataverseNL page, you can also inform researchers who want to reuse your data about the procedure to request the data. |
| |
| ==== Example dataset: Corpus PINO ==== | ==== Example dataset: Corpus PINO ==== |
| <color #7092be>→</color>[[https://doi.org/10.34894/R1WHEA |Corpus PINO: A spoken language resource for multiple simultaneous comparisons. (Cristiano et al., 2024)]] | <color #7092be>→</color>[[https://doi.org/10.34894/R1WHEA |Corpus PINO: A spoken language resource for multiple simultaneous comparisons. (Cristiano et al., 2024)]] |
| ---- | |
| //“Corpus PINO is a resource designed for research on different styles of spoken Italian and Neapolitan dialect. The corpus consists of anonymized audio recordings and ELAN time-aligned orthographic transcriptions involving fifty participants (stratified by age, gender, and education level). …. PINO is a contribution to the preservation of the local cultural heritage and of a minority language, i.e., an italo-romance dialect. It attests the lives, memories, opinions, traditions, practices, attitudes of fifty members of this community.”// | //“Corpus PINO is a resource designed for research on different styles of spoken Italian and Neapolitan dialect. The corpus consists of anonymized audio recordings and ELAN time-aligned orthographic transcriptions involving fifty participants (stratified by age, gender, and education level). …. PINO is a contribution to the preservation of the local cultural heritage and of a minority language, i.e., an Italo-Romance dialect. It attests the lives, memories, opinions, traditions, practices, and attitudes of fifty members of this community.”// |
| ---- | ---- |
| ===Score the sensitivity of your data and supporting materials === | ===Score the sensitivity of your data and supporting materials === |
| |
| ===Define the terms of use=== | ===Define the terms of use=== |
| [[https://www.rug.nl/library/open-access/how-to-publish-open-access/creative-commons-licenses|Creative Commons licenses]] are not suitable for data containing personal data with access restrictions. Instead, custom terms of use have to be set which will largely depend on the consent given by the participants and the degree of de-identification. As such, the custom terms of use have to reflect what is allowed according to the informed consent. | Given that[[https://www.rug.nl/library/open-access/how-to-publish-open-access/creative-commons-licenses|Creative Commons licenses]] are not suitable for datasets containing personal data with access restrictions, the custom terms of use largely depend on the consent given by the participants and the degree of de-identification: |
| |
| **Corpus PINO terms of use:** //"This data can be accessed and reused by researchers affiliated with universities or no-profit, non-commercial organizations in the fields of linguistics, semiotics, sociology, anthropology, and affiliated fields. Due to their increased re-identification potential, the audio files in the corpus shall be facilitated to linguistic and relevant discipline-specific research where analyzing the audio content is pertinent. In these cases, the signing of a data transfer agreement is necessary."// | **Corpus PINO terms of use:** //"This data can be accessed and reused by researchers affiliated with universities or non-profit, non-commercial organizations in the fields of linguistics, semiotics, sociology, anthropology, and affiliated fields. Due to their increased re-identification potential, the audio files in the corpus shall be facilitated for linguistic and relevant discipline-specific research where analyzing the audio content is pertinent. In these cases, the signing of a data transfer agreement is necessary."// |
| |
| ---- | ---- |
| |
| === Data sharing agreement === | === Data transfer agreement === |
| When an external party requests level 3 or in some cases level 2 data, a data transfer agreement needs to be signed. A data transfer agreement is a legal contract that defines the specific purposes for which the data may be used by the requesting party. As such it is the most comprehensive specification of terms of use. The data sharing agreement also describes the rights and obligations of both parties involved and sets out the measures for data protection. The UG has its own model data transfer agreement that can be tailored for the dataset of your research project. | When an external party requests level 3 or, in some cases, level 2 data, a data transfer agreement needs to be signed. A data transfer agreement is a legal contract that defines the specific purposes for which the data may be used by the requesting party. As such, it is the most comprehensive specification of terms of use. The data transfer agreement also describes the rights and obligations of both parties involved and sets out the measures for data protection. The UG has its own model data transfer agreement that can be tailored for the dataset of your research project. |
| |
| <color #7092be>→</color>[[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/data-protection/protocols-agreements|Refer to the DCC website for more information on legal agreements]] | <color #7092be>→</color>[[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/data-protection/protocols-agreements|Refer to the DCC website for more information on legal agreements]] |