| Both sides previous revision Previous revision Next revision | Previous revision |
| dcc:pdpsol:pseudonymization [2026/04/29 12:43] – marlon | dcc:pdpsol:pseudonymization [2026/05/13 13:25] (current) – alba |
|---|
| {{indexmenu_n>3}} | {{indexmenu_n>3}} |
| ====== Pseudonymization protocols ====== | ====== Pseudonymization procedures ====== |
| ===== Introduction ===== | ===== Introduction ===== |
| Pseudonymization is a de-identification procedure during which personally identifiable information is replaced by a unique alias or code (pseudonym). In some situations, the researcher maintains the link between the unique code and the data subject in a keyfile, while in other projects, this connection is not necessary. | Pseudonymization is a de-identification procedure during which personally identifiable information is replaced by a unique alias or code (pseudonym). In some situations, the researcher maintains the link between the unique code and the data subject in a keyfile, while in other projects, this connection is not necessary. |
| |
| First, assign each data subject in your dataset a unique pseudonymization ID. That ID must be unique, non-informative, and non-derivable from personal data. In Excel, you can generate a randomized list of IDs by entering the following formula in the formula bar: <color #ed1c24>=SORTBY(SEQUENCE(N), RANDARRAY(N))</color>. Here, <color #ed1c24>SEQUENCE(N)</color> generates numbers from 1 to <color #ed1c24>N</color>, and <color #ed1c24>RANDARRAY(N)</color> randomizes their order. Choose an <color #ed1c24>N</color> larger than your sample size (e.g., for 100 data subjects, <color #ed1c24>N</color> = 1000). Remove all direct identifiers (e.g. names, email addresses, phone numbers) or replace them with these IDs. Use the same ID consistently across related files (e.g., transcripts, survey responses, and other data) to preserve linkability while protecting the identity of your data subjects (Figure 1). | First, assign each data subject in your dataset a unique pseudonymization ID. That ID must be unique, non-informative, and non-derivable from personal data. In Excel, you can generate a randomized list of IDs by entering the following formula in the formula bar: <color #ed1c24>=SORTBY(SEQUENCE(N), RANDARRAY(N))</color>. Here, <color #ed1c24>SEQUENCE(N)</color> generates numbers from 1 to <color #ed1c24>N</color>, and <color #ed1c24>RANDARRAY(N)</color> randomizes their order. Choose an <color #ed1c24>N</color> larger than your sample size (e.g., for 100 data subjects, <color #ed1c24>N</color> = 1000). Remove all direct identifiers (e.g. names, email addresses, phone numbers) or replace them with these IDs. Use the same ID consistently across related files (e.g., transcripts, survey responses, and other data) to preserve linkability while protecting the identity of your data subjects (Figure 1). |
| |
| **Warning:** pseudonymization does not equal anonymization. An anonymized dataset does not allow for the re-identification of data subjects and is therefore no longer considered personal data. Even if all [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/gdpr-research/essential-concepts|direct identifiers]] and your pseudonymization key have been replaced or removed, it might still be possible to re-identify some data subjects in your data because, in combination, certain attributes (e.g., combination of height, job occupation and location of data collection) may single out an individual. | |
| ---- | ---- |
| |
| |
| ==== Managing the keyfile ==== | ==== Managing the keyfile ==== |
| You must store the keyfile securely in your [[https://www.rug.nl/digital-competence-centre/it-solutions/store-and-archive/university-workplace-storage|X-Drive or Y-Drive]], separate from the research data. Strictly limit access to the keyfile, for example, by using password-protected, encrypted, or stored on a secure server. This ensures that, even if the research dataset is shared or accessed by unauthorized parties, re-identification is only possible if the keyfile is also compromised. Keep track of who is authorized to access the keyfile, and if necessary (e.g. longitudinal research), ensure that there are two people who have access to the key in case one becomes unavailable. | You must store the keyfile securely in your [[https://www.rug.nl/digital-competence-centre/it-solutions/store-and-archive/university-workplace-storage|X-Drive or Y-Drive]], separate from the research data. Strictly limit access to the keyfile, for example, by using an extra layer of encryption (e.g. using Excel's encryption functionality or [[..:itsol:veracrypt:|Veracrypt]]). This ensures that, even if the research dataset is shared or accessed by unauthorized parties, re-identification is only possible if the keyfile is also compromised. Keep track of who is authorized to access the keyfile, and if necessary (e.g. longitudinal research), ensure that there are two people who have access to the key in case one becomes unavailable. |
| | |
| Determine whether it is necessary to maintain a link between your research data and the individuals involved. Do not retain the keyfile longer than necessary for your research (e.g., [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/gdpr-research/rights-of-human-data-subjects-in-scientific-research|exercising data subjects’ rights]] or informing data subjects about the research). | |
| |
| **Warning:** Pseudonymization does not equal anonymization. An anonymized dataset does not allow for the re-identification of data subjects and is therefore no longer considered personal data. Even if all [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/gdpr-research/essential-concepts|direct identifiers]] and your pseudonymization key have been replaced or removed, it might still be possible to re-identify some data subjects in your data because, in combination, certain attributes (e.g., combination of height, job occupation and location of data collection) may single out an individual. | Only maintain a link between your research data and the individuals involved if necessary. Do not retain the keyfile longer than necessary for your research (e.g., [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/gdpr-research/rights-of-human-data-subjects-in-scientific-research|exercising data subjects’ rights]] or informing data subjects about the research). |
| |
| ===== Double coding ===== | ===== Double coding ===== |