Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
dcc:pdpsol:de-identification [2026/04/30 11:38] marlondcc:pdpsol:de-identification [2026/06/25 11:05] (current) marlon
Line 1: Line 1:
 {{indexmenu_n>2}} {{indexmenu_n>2}}
-===== De-identification, Anonymization and Pseudonymization =====+===== De-identification, anonymization and pseudonymization =====
 ==== Introduction ==== ==== Introduction ====
 De-identification is the masking, manipulation or removal of personal data with the aim of making individuals in a dataset less easy to identify. It is especially important when you want to share, publish or archive your dataset, but it can also help protect your participants' privacy in case of a [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/data-protection/data-leak|data leak]] during your research. During the different phases of your research, you should determine whether it is possible to de-identify your dataset while also keeping in mind its usability.  De-identification is the masking, manipulation or removal of personal data with the aim of making individuals in a dataset less easy to identify. It is especially important when you want to share, publish or archive your dataset, but it can also help protect your participants' privacy in case of a [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/data-protection/data-leak|data leak]] during your research. During the different phases of your research, you should determine whether it is possible to de-identify your dataset while also keeping in mind its usability. 
Line 20: Line 20:
 **Table 1:** De-identification matrix adapted from [[https://lcrdm.nl/wp-content/uploads/2023/03/LCRDM-Risk-management-for-research-data-about-people.pdf|LCRDM (2019)]]. This matrix is an example of what de-identification and anonymization could look like in research. The identifiability of your data largely depends on the context of your research and only partly on the variables you collected. For example, the variable judge could be more identifiable for a person living in Leeuwarden than for a person living in Amsterdam, because more judges live in Amsterdam.  **Table 1:** De-identification matrix adapted from [[https://lcrdm.nl/wp-content/uploads/2023/03/LCRDM-Risk-management-for-research-data-about-people.pdf|LCRDM (2019)]]. This matrix is an example of what de-identification and anonymization could look like in research. The identifiability of your data largely depends on the context of your research and only partly on the variables you collected. For example, the variable judge could be more identifiable for a person living in Leeuwarden than for a person living in Amsterdam, because more judges live in Amsterdam. 
 ---- ----
-{{:dcc:pdpsol:de-identification:de-identification_matrix.png?direct&800|}}+{{:dcc:pdpsol:de-identification:de-identification_matrix2.png?direct&1600|}}
  
    
 ==== General de-identification techniques ==== ==== General de-identification techniques ====
-There are several techniques that can help you make your dataset less identifiableYou can apply these techniques during different phases of your research:+Use the de-identification techniques outlined below to reduce the identifiability of your dataset. Be aware that these techniques often affect its analytical value. Therefore, always make sure to document the way you transformed your data.  
  
-  * After data collection to protect participants when analyzing their data+You can apply these techniques during different phases of your research: 
 + 
 +  * After data collectionto protect participants when analyzing their data
   * Before sharing data with collaborators or other third parties   * Before sharing data with collaborators or other third parties
   * Before archiving data   * Before archiving data
   * Before publishing data (with access restrictions)   * Before publishing data (with access restrictions)
  
-Be aware that these techniques often affect its analytical value. Therefore, always make sure to document the way you transformed your data.  +
  
  
Line 91: Line 93:
  
 ==== Research specific de-identification techniques ====  ==== Research specific de-identification techniques ==== 
-=== Video data === +=== Videos or images === 
-Researchers use video to record real-world behaviour, interactions, or experiments in detail, for example, tracking how people move, communicate, or perform tasks over time. It is important to de-identify this type of data, because videos can easily reveal faces, voices, or surroundings, and leaving those visible can reveal participants’ identities.+Researchers use videos or images to record real-world behaviour, interactions, or experiments in detail, for example, tracking how people move, communicate, or perform tasks over time. It is important to de-identify this type of data, because they can easily reveal faces, voices, or surroundings, and leaving those visible can reveal participants’ identities.  
 + 
 +++++ (Click) Face and body masking | You can use video editing software, such as [[https://www.adobe.com/nl/products/photoshop.html| Adobe Premiere]], to distort or obscure identifiable information in videos. For images, tools like Paint or [[https://www.adobe.com/nl/products/photoshop.html|Adobe Photoshop]] can be used to blur or pixelate personal identifiable information. ++++ 
  
-++++ (Click) Face and body masking |[[https://github.com/MaskAnyone/MaskAnyone|MaskAnyone]] is a de-identification toolbox for videos that allows you to remove personal identifiable information from videos, while at the same time preserving utility. It provides a variety of algorithms that allow you to de-identify or even anonymize videos (video & audio).  
-++++  
 ++++ (Click) Metadata de-identification | ++++ (Click) Metadata de-identification |
 Even after de-identifying video data so it's unrecognizable to people or machines, metadata, such as timestamps or location tags, can still indirectly reveal participants’ identities. Even after de-identifying video data so it's unrecognizable to people or machines, metadata, such as timestamps or location tags, can still indirectly reveal participants’ identities.
Line 102: Line 104:
   * Network identifiers (e.g. IP addresses)   * Network identifiers (e.g. IP addresses)
   * Device or user IDs (e.g. serial numbers or account IDs)   * Device or user IDs (e.g. serial numbers or account IDs)
 +
 +To manually remove metadata from a Windows file:
 +
 +  - Open **File Explorer**.
 +  -  Select the file(s) and Right-click.
 +  - Select **Properties**.
 +  - Navigate to the **Details** tab.
 +  - Click **Remove Properties and Personal Information**.
 +  - Choose to remove **all possible properties** or **a self-defined set of properties** 
 +----
 +{{:dcc:pdpsol:de-identification:metadata_de-identification.png?800|}}
  
 ++++  ++++ 
Line 130: Line 143:
   * Network identifiers (e.g. IP addresses)   * Network identifiers (e.g. IP addresses)
   * Device or user IDs (e.g. serial numbers or account IDs)   * Device or user IDs (e.g. serial numbers or account IDs)
 +
 +To manually remove metadata from a Windows file:
 +
 +  - Open **File Explorer**.
 +  - Select the file(s) and Right-click.
 +  - Select **Properties**.
 +  - Navigate to the **Details** tab.
 +  - Click **Remove Properties and Personal Information**.
 +  - Choose to remove **all possible properties** or **a self-defined set of properties** 
 +----
 +{{:dcc:pdpsol:de-identification:metadata_de-identification.png?800|}}
 +
 ++++  ++++ 
  
 ---- ----
 [[dcc:pdpsol:start | → Go back to the Privacy & Data protection home page]] [[dcc:pdpsol:start | → Go back to the Privacy & Data protection home page]]