Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
dcc:pdpsol:de-identification [2026/02/17 09:49] marlondcc:pdpsol:de-identification [2026/02/17 10:09] (current) marlon
Line 5: Line 5:
  
 ==== Anonymization versus pseudonymization ==== ==== Anonymization versus pseudonymization ====
 +
 === Pseudonymization === === Pseudonymization ===
-Pseudonymization is a de-identification procedure during which personally identifiable information is replaced by an unique alias or code (pseudonym). In general, the names and/or contact details of data subjects are stored with this pseudonym in a so-called keyfile. The keyfile enables the re-identification of individuals in the dataset. Keyfiles are stored separately from the rest of the data and access should be restricted. In contrast to an anonymized dataset, a pseudonymized dataset in principle still allows for the re-identification of data subjects. +Pseudonymization is a de-identification procedure which is often implemented during data collection. During pseudonymization personally identifiable information is replaced by an unique alias or code (pseudonym). In general, the names and/or contact details of data subjects are stored with this pseudonym in a so-called keyfile. The keyfile enables the re-identification of individuals in the dataset. Keyfiles are stored separately from the rest of the data and access should be restricted. In contrast to an anonymized dataset, a pseudonymized dataset in principle still allows for the re-identification of data subjects. 
  
-[[pseudonymization|Refer to our page on pseudonymization for practical advise on its implementation.]] +[[pseudonymization|→ Refer to our page on pseudonymization for practical advise on its implementation.]] 
  
 === Anonymization === === Anonymization ===
 Anonymization is a de-identification procedure during which “personal data is altered in such a way that a data subject can no longer be identified directly or indirectly, either by the data controller alone or in collaboration with any other party." ([[https://www.iso.org/standard/63553.html|ISO 25237:2017 Health informatics -- Pseudonymization]]. ISO. 2017. p. 7.). In contrast to a pseudonymized dataset, an anonymized dataset does not allow for the re-identification of data subjects and is therefore no longer considered personal data.   Anonymization is a de-identification procedure during which “personal data is altered in such a way that a data subject can no longer be identified directly or indirectly, either by the data controller alone or in collaboration with any other party." ([[https://www.iso.org/standard/63553.html|ISO 25237:2017 Health informatics -- Pseudonymization]]. ISO. 2017. p. 7.). In contrast to a pseudonymized dataset, an anonymized dataset does not allow for the re-identification of data subjects and is therefore no longer considered personal data.  
  
-**Warning:** de-identification does not equal anonymization. Although all direct identifiers and your pseudonymization key have been removed or replaced, it might still be possible to re-identify some data subjects in your data because, in combination, certain attributes (e.g., combination of height, job occupation and location of data collection) may single out an individual.+**Warning:** de-identification does not equal anonymization. Although all direct identifiers and your pseudonymization key have been replaced or removed, it might still be possible to re-identify some data subjects in your data because, in combination, certain attributes (e.g., combination of height, job occupation and location of data collection) may single out an individual.
  
 ==== De-identification techniques ==== ==== De-identification techniques ====
Line 29: Line 30:
   * Make numerical values less precise.   * Make numerical values less precise.
   * Replace identifiable text with ‘[redacted]’.   * Replace identifiable text with ‘[redacted]’.
-Masking is typically partial, i.e. applied only to some characters in the attribute. For example, in the case of a postal code: change 9746DC into 97****+Masking is typically partial, i.e. applied only to some characters in the attribute. For example, in the case of a postal code: change 9746DC into 97∗∗∗∗
  
 === Aggregation & generalization === === Aggregation & generalization ===