| Both sides previous revision Previous revision Next revision | Previous revision |
| dcc:pdpsol:de-identification [2026/04/30 11:37] – marlon | dcc:pdpsol:de-identification [2026/05/13 13:23] (current) – alba |
|---|
| {{indexmenu_n>2}} | {{indexmenu_n>2}} |
| ===== De-identification, Anonymization and Pseudonymization ===== | ===== De-identification, anonymization and pseudonymization ===== |
| ==== Introduction ==== | ==== Introduction ==== |
| De-identification is the masking, manipulation or removal of personal data with the aim of making individuals in a dataset less easy to identify. It is especially important when you want to share, publish or archive your dataset, but it can also help protect your participants' privacy in case of a [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/data-protection/data-leak|data leak]] during your research. During the different phases of your research, you should determine whether it is possible to de-identify your dataset while also keeping in mind its usability. | De-identification is the masking, manipulation or removal of personal data with the aim of making individuals in a dataset less easy to identify. It is especially important when you want to share, publish or archive your dataset, but it can also help protect your participants' privacy in case of a [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/data-protection/data-leak|data leak]] during your research. During the different phases of your research, you should determine whether it is possible to de-identify your dataset while also keeping in mind its usability. |
| **Warning:** de-identification does not equal anonymization. Even if all [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/gdpr-research/essential-concepts|direct identifiers]] and your pseudonymization key have been replaced or removed, it might still be possible to re-identify some data subjects in your data because, in combination, certain attributes (e.g., combination of height, job occupation and location of data collection) may single out an individual. | **Warning:** de-identification does not equal anonymization. Even if all [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/gdpr-research/essential-concepts|direct identifiers]] and your pseudonymization key have been replaced or removed, it might still be possible to re-identify some data subjects in your data because, in combination, certain attributes (e.g., combination of height, job occupation and location of data collection) may single out an individual. |
| ---- | ---- |
| **Table 1:** De-identification matrix adapted from [[https://lcrdm.nl/wp-content/uploads/2023/03/LCRDM-Risk-management-for-research-data-about-people.pdf|LCRDM (2019)]]. This matrix is an example of what de-identification and anonymization could look like in research. The identifiability of your data largely depends on the context of your research and only partly on the variables you collected. For example, the variable judge could be more identifiable in Leeuwarden than in Amsterdam, because more judges live in Amsterdam. | **Table 1:** De-identification matrix adapted from [[https://lcrdm.nl/wp-content/uploads/2023/03/LCRDM-Risk-management-for-research-data-about-people.pdf|LCRDM (2019)]]. This matrix is an example of what de-identification and anonymization could look like in research. The identifiability of your data largely depends on the context of your research and only partly on the variables you collected. For example, the variable judge could be more identifiable for a person living in Leeuwarden than for a person living in Amsterdam, because more judges live in Amsterdam. |
| ---- | ---- |
| {{:dcc:pdpsol:de-identification:de-identification_matrix.png?direct&800|}} | {{:dcc:pdpsol:de-identification:de-identification_matrix.png?direct&800|}} |
| | |
| ==== General de-identification techniques ==== | ==== General de-identification techniques ==== |
| There are several techniques that can help you make your dataset less identifiable. You can apply these techniques during different phases of your research: | Use the de-identification techniques outlined below to reduce the identifiability of your dataset. Be aware that these techniques often affect its analytical value. Therefore, always make sure to document the way you transformed your data. |
| |
| * After data collection to protect participants when analyzing their data | You can apply these techniques during different phases of your research: |
| | |
| | * After data collection, to protect participants when analyzing their data |
| * Before sharing data with collaborators or other third parties | * Before sharing data with collaborators or other third parties |
| * Before archiving data | * Before archiving data |
| * Before publishing data (with access restrictions) | * Before publishing data (with access restrictions) |
| |
| Be aware that these techniques often affect its analytical value. Therefore, always make sure to document the way you transformed your data. | |
| |
| |