Differences

This shows you the differences between two versions of the page.

--- dcc:pdpsol:de-identification [2026/04/30 11:37] – marlon
+++ dcc:pdpsol:de-identification [2026/06/25 11:05] (current) – marlon
@@ Line 1: / Line 1: @@
 {{indexmenu_n>2}}
-===== De-identification, Anonymization and Pseudonymization =====
+===== De-identification, anonymization and pseudonymization =====
 ==== Introduction ====
 De-identification is the masking, manipulation or removal of personal data with the aim of making individuals in a dataset less easy to identify. It is especially important when you want to share, publish or archive your dataset, but it can also help protect your participants' privacy in case of a [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/data-protection/data-leak|data leak]] during your research. During the different phases of your research, you should determine whether it is possible to de-identify your dataset while also keeping in mind its usability.
@@ Line 18: / Line 18: @@
 **Warning:** de-identification does not equal anonymization. Even if all [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/gdpr-research/essential-concepts|direct identifiers]] and your pseudonymization key have been replaced or removed, it might still be possible to re-identify some data subjects in your data because, in combination, certain attributes (e.g., combination of height, job occupation and location of data collection) may single out an individual.
 ----
-**Table 1:** De-identification matrix adapted from [[https://lcrdm.nl/wp-content/uploads/2023/03/LCRDM-Risk-management-for-research-data-about-people.pdf|LCRDM (2019)]]. This matrix is an example of what de-identification and anonymization could look like in research. The identifiability of your data largely depends on the context of your research and only partly on the variables you collected. For example, the variable judge could be more identifiable in Leeuwarden than in Amsterdam, because more judges live in Amsterdam.
+**Table 1:** De-identification matrix adapted from [[https://lcrdm.nl/wp-content/uploads/2023/03/LCRDM-Risk-management-for-research-data-about-people.pdf|LCRDM (2019)]]. This matrix is an example of what de-identification and anonymization could look like in research. The identifiability of your data largely depends on the context of your research and only partly on the variables you collected. For example, the variable judge could be more identifiable for a person living in Leeuwarden than for a person living in Amsterdam, because more judges live in Amsterdam.
 ----
-{{:dcc:pdpsol:de-identification:de-identification_matrix.png?direct&800|}}
+{{:dcc:pdpsol:de-identification:de-identification_matrix2.png?direct&1600|}}
 ==== General de-identification techniques ====
-There are several techniques that can help you make your dataset less identifiable. You can apply these techniques during different phases of your research:
+Use the de-identification techniques outlined below to reduce the identifiability of your dataset. Be aware that these techniques often affect its analytical value. Therefore, always make sure to document the way you transformed your data.
-  * After data collection to protect participants when analyzing their data
+You can apply these techniques during different phases of your research:
+  * After data collection, to protect participants when analyzing their data
   * Before sharing data with collaborators or other third parties
   * Before archiving data
   * Before publishing data (with access restrictions)
-Be aware that these techniques often affect its analytical value. Therefore, always make sure to document the way you transformed your data.
@@ Line 91: / Line 93: @@
 ==== Research specific de-identification techniques ====
-=== Video data ===
+=== Videos or images ===
-Researchers use video to record real-world behaviour, interactions, or experiments in detail, for example, tracking how people move, communicate, or perform tasks over time. It is important to de-identify this type of data, because videos can easily reveal faces, voices, or surroundings, and leaving those visible can reveal participants’ identities.
+Researchers use videos or images to record real-world behaviour, interactions, or experiments in detail, for example, tracking how people move, communicate, or perform tasks over time. It is important to de-identify this type of data, because they can easily reveal faces, voices, or surroundings, and leaving those visible can reveal participants’ identities.
+++++ (Click) Face and body masking | You can use video editing software, such as [[https://www.adobe.com/nl/products/photoshop.html| Adobe Premiere]], to distort or obscure identifiable information in videos. For images, tools like Paint or [[https://www.adobe.com/nl/products/photoshop.html|Adobe Photoshop]] can be used to blur or pixelate personal identifiable information. ++++
-++++ (Click) Face and body masking |[[https://github.com/MaskAnyone/MaskAnyone|MaskAnyone]] is a de-identification toolbox for videos that allows you to remove personal identifiable information from videos, while at the same time preserving utility. It provides a variety of algorithms that allow you to de-identify or even anonymize videos (video & audio).
-++++
 ++++ (Click) Metadata de-identification |
 Even after de-identifying video data so it's unrecognizable to people or machines, metadata, such as timestamps or location tags, can still indirectly reveal participants’ identities.
@@ Line 102: / Line 104: @@
   * Network identifiers (e.g. IP addresses)
   * Device or user IDs (e.g. serial numbers or account IDs)
+To manually remove metadata from a Windows file:
+  - Open **File Explorer**.
+  -  Select the file(s) and Right-click.
+  - Select **Properties**.
+  - Navigate to the **Details** tab.
+  - Click **Remove Properties and Personal Information**.
+  - Choose to remove **all possible properties** or **a self-defined set of properties**
+----
+{{:dcc:pdpsol:de-identification:metadata_de-identification.png?800|}}
 ++++
@@ Line 130: / Line 143: @@
   * Network identifiers (e.g. IP addresses)
   * Device or user IDs (e.g. serial numbers or account IDs)
+To manually remove metadata from a Windows file:
+  - Open **File Explorer**.
+  - Select the file(s) and Right-click.
+  - Select **Properties**.
+  - Navigate to the **Details** tab.
+  - Click **Remove Properties and Personal Information**.
+  - Choose to remove **all possible properties** or **a self-defined set of properties**
+----
+{{:dcc:pdpsol:de-identification:metadata_de-identification.png?800|}}
 ++++
 ----
 [[dcc:pdpsol:start | → Go back to the Privacy & Data protection home page]]