Differences

This shows you the differences between two versions of the page.

--- dcc:pdpsol:pseudonymization [2026/01/15 12:20] – marlon
+++ dcc:pdpsol:pseudonymization [2026/01/15 12:47] (current) – marlon
@@ Line 3: / Line 3: @@
 //“Pseudonymization is a de-identification procedure during which personally identifiable information is replaced by a unique alias or code (pseudonym). In some situations, the researcher maintains the link between the unique code and the data subject in a keyfile, while in other projects this connection is not necessary.”//
-Make sure that you describe the process of pseudonymization in a protocol. Determine whether it is necessary to maintain a link between your research data and the individuals involved. You might want to keep a link during longitudinal research and while [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/gdpr-research/rights-of-human-data-subjects-in-scientific-research|participants can exercise their rights]].
+The manner in which pseudonymization is done varies between projects based on the research setup. It is good practice to describe the process of pseudonymization specific to your research either in your data management plan or a separate Pseudonymization Protocol. You can use the examples below to guide you in this process.
-You can use the examples below to support you in writing a protocol.
 ===== Pseudonymization without a keyfile =====
@@ Line 11: / Line 9: @@
 This is the simplest form of pseudonymization, where direct identifiers are either removed or replaced by a pseudonymization ID. It can be applied to direct identifiers in a single file or used to connect multiple files.
-First, assign each data subject in your dataset a unique pseudonymization ID. That ID must be unique, non-informative, and non-derivable from personal data. In Excel, you can generate a randomized list of IDs by entering the following formula in the formula bar:  <color #ed1c24>=SORTBY(SEQUENCE(N), RANDARRAY(N))</color>. Here, <color #ed1c24>SEQUENCE(N)</color> generates numbers from 1 to N, and RANDARRAY(N) randomizes their order. Choose an <color #ed1c24>N</color> larger than your sample size (e.g., for 100 data subjects, <color #ed1c24>N</color> = 1000). Remove all direct identifiers (e.g. names, email addresses, phone numbers) or replace them with these IDs. Use the same ID consistently across related files (e.g., transcripts, survey responses, and other data) to preserve linkability while protecting the identity of your data subjects (Figure 2).
+First, assign each data subject in your dataset a unique pseudonymization ID. That ID must be unique, non-informative, and non-derivable from personal data. In Excel, you can generate a randomized list of IDs by entering the following formula in the formula bar:  <color #ed1c24>=SORTBY(SEQUENCE(N), RANDARRAY(N))</color>. Here, <color #ed1c24>SEQUENCE(N)</color> generates numbers from 1 to <color #ed1c24>N</color>, and <color #ed1c24>RANDARRAY(N)</color> randomizes their order. Choose an <color #ed1c24>N</color> larger than your sample size (e.g., for 100 data subjects, <color #ed1c24>N</color> = 1000). Remove all direct identifiers (e.g. names, email addresses, phone numbers) or replace them with these IDs. Use the same ID consistently across related files (e.g., transcripts, survey responses, and other data) to preserve linkability while protecting the identity of your data subjects (Figure 2).
 **Warning:** Pseudonymization does not equal anonymization. Although all direct identifiers have been removed or replaced, it might still be possible to re-identify some data subjects in your data because, in combination, certain attributes (e.g., combination of height, job occupation and location of data collection) may single out an individual.
-===== Simple pseudonymization with a keyfile =====
+===== Pseudonymization with a keyfile =====
-In this approach, you create pseudonymization IDs and securely store them in a separate keyfile. The keyfile contains the mapping between direct identifiers and pseudonymization IDs. The research dataset itself will only include the pseudonymization IDs, while the keyfile is kept apart and protected.
+In this approach, you create pseudonymization IDs and securely store them together with contact details of your participants or other sensitive data in a keyfile. The dataset itself will only include the pseudonymization IDs, while the keyfile is kept apart and protected.
+==== Creating a keyfile ====
+First, create a pseudonymization keyfile (e.g., in Excel or another UG approved tool available in the University Workplace) that assigns each data subject a unique pseudonymization ID. For the creation of the pseudonymization IDs, you can follow the same steps as described in //Pseudonymization without a keyfile//. Direct identifiers such as names, email addresses, or phone numbers should be stored only in this keyfile and removed from the research dataset. In the research dataset, remove all direct identifiers or replace them with the pseudonymization IDs.
-First, create a pseudonymization keyfile (e.g., in Excel or another UG approved tool available in the University Workplace) that assigns each data subject a unique pseudonymization ID (See example 1). Direct identifiers such as names, email addresses, or phone numbers should be stored only in this keyfile and removed from the research dataset. In the research dataset, remove all direct identifiers or replace them with the pseudonymization IDs. You must store the keyfile securely in your [[https://www.rug.nl/digital-competence-centre/it-solutions/store-and-archive/university-workplace-storage|X-Drive or Y-Drive]], separate from the research data. Strictly limit access to the keyfile, for example, by using password-protected, encrypted, or stored on a secure server. This ensures that, even if the research dataset is shared or accessed by unauthorized parties, re-identification is only possible if the keyfile is also compromised.
+==== Managing the keyfile ====
+You must store the keyfile securely in your [[https://www.rug.nl/digital-competence-centre/it-solutions/store-and-archive/university-workplace-storage|X-Drive or Y-Drive]], separate from the research data. Strictly limit access to the keyfile, for example, by using password-protected, encrypted, or stored on a secure server. This ensures that, even if the research dataset is shared or accessed by unauthorized parties, re-identification is only possible if the keyfile is also compromised. Keep track of who is authorized to access the keyfile, and if necessary (e.g. longitudinal research), ensure that there are two people who have access to the key in case one becomes unavailable.
-Keep track of who is authorized to access the keyfile, and if necessary (e.g. longitudinal research), ensure that there are two people who have access to the key in case one becomes unavailable. Do not retain the keyfile longer than necessary for your research (e.g., exercising data subjects’ rights or informing data subjects about the research).
+Determine whether it is necessary to maintain a link between your research data and the individuals involved. Do not retain the keyfile longer than necessary for your research (e.g., [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/gdpr-research/rights-of-human-data-subjects-in-scientific-research|exercising data subjects’ rights]] or informing data subjects about the research).
 **Warning:** As with the simple procedure, pseudonymization does not equal anonymization. Even without the keyfile, data subjects may still be identifiable if certain attributes in the dataset can be combined to single out an individual (e.g., combination of age, job occupation, and location).