| Both sides previous revision Previous revision Next revision | Previous revision |
| dcc:pdpsol:de-identification [2026/02/17 09:49] – marlon | dcc:pdpsol:de-identification [2026/02/17 10:09] (current) – marlon |
|---|
| |
| ==== Anonymization versus pseudonymization ==== | ==== Anonymization versus pseudonymization ==== |
| | |
| === Pseudonymization === | === Pseudonymization === |
| Pseudonymization is a de-identification procedure during which personally identifiable information is replaced by an unique alias or code (pseudonym). In general, the names and/or contact details of data subjects are stored with this pseudonym in a so-called keyfile. The keyfile enables the re-identification of individuals in the dataset. Keyfiles are stored separately from the rest of the data and access should be restricted. In contrast to an anonymized dataset, a pseudonymized dataset in principle still allows for the re-identification of data subjects. | Pseudonymization is a de-identification procedure which is often implemented during data collection. During pseudonymization personally identifiable information is replaced by an unique alias or code (pseudonym). In general, the names and/or contact details of data subjects are stored with this pseudonym in a so-called keyfile. The keyfile enables the re-identification of individuals in the dataset. Keyfiles are stored separately from the rest of the data and access should be restricted. In contrast to an anonymized dataset, a pseudonymized dataset in principle still allows for the re-identification of data subjects. |
| |
| [[pseudonymization|Refer to our page on pseudonymization for practical advise on its implementation.]] | [[pseudonymization|→ Refer to our page on pseudonymization for practical advise on its implementation.]] |
| |
| === Anonymization === | === Anonymization === |
| Anonymization is a de-identification procedure during which “personal data is altered in such a way that a data subject can no longer be identified directly or indirectly, either by the data controller alone or in collaboration with any other party." ([[https://www.iso.org/standard/63553.html|ISO 25237:2017 Health informatics -- Pseudonymization]]. ISO. 2017. p. 7.). In contrast to a pseudonymized dataset, an anonymized dataset does not allow for the re-identification of data subjects and is therefore no longer considered personal data. | Anonymization is a de-identification procedure during which “personal data is altered in such a way that a data subject can no longer be identified directly or indirectly, either by the data controller alone or in collaboration with any other party." ([[https://www.iso.org/standard/63553.html|ISO 25237:2017 Health informatics -- Pseudonymization]]. ISO. 2017. p. 7.). In contrast to a pseudonymized dataset, an anonymized dataset does not allow for the re-identification of data subjects and is therefore no longer considered personal data. |
| |
| **Warning:** de-identification does not equal anonymization. Although all direct identifiers and your pseudonymization key have been removed or replaced, it might still be possible to re-identify some data subjects in your data because, in combination, certain attributes (e.g., combination of height, job occupation and location of data collection) may single out an individual. | **Warning:** de-identification does not equal anonymization. Although all direct identifiers and your pseudonymization key have been replaced or removed, it might still be possible to re-identify some data subjects in your data because, in combination, certain attributes (e.g., combination of height, job occupation and location of data collection) may single out an individual. |
| |
| ==== De-identification techniques ==== | ==== De-identification techniques ==== |
| * Make numerical values less precise. | * Make numerical values less precise. |
| * Replace identifiable text with ‘[redacted]’. | * Replace identifiable text with ‘[redacted]’. |
| Masking is typically partial, i.e. applied only to some characters in the attribute. For example, in the case of a postal code: change 9746DC into 97****. | Masking is typically partial, i.e. applied only to some characters in the attribute. For example, in the case of a postal code: change 9746DC into 97∗∗∗∗. |
| |
| === Aggregation & generalization === | === Aggregation & generalization === |