Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
dcc:pdpsol:pseudonymization [2026/04/10 09:35] albadcc:pdpsol:pseudonymization [2026/04/10 09:41] (current) alba
Line 12: Line 12:
 ===== Pseudonymization without a keyfile ===== ===== Pseudonymization without a keyfile =====
  
-This is the simplest form of pseudonymization, where direct identifiers are either removed or replaced by a pseudonymization ID. It can be applied to direct identifiers in a single file or used to connect multiple files.  +This is the simplest form of pseudonymization, in which direct identifiers are either removed or replaced with a pseudonymization ID. It can be applied to direct identifiers in a single file or used to connect multiple files.  
  
 First, assign each data subject in your dataset a unique pseudonymization ID. That ID must be unique, non-informative, and non-derivable from personal data. In Excel, you can generate a randomized list of IDs by entering the following formula in the formula bar:  <color #ed1c24>=SORTBY(SEQUENCE(N), RANDARRAY(N))</color>. Here, <color #ed1c24>SEQUENCE(N)</color> generates numbers from 1 to <color #ed1c24>N</color>, and <color #ed1c24>RANDARRAY(N)</color> randomizes their order. Choose an <color #ed1c24>N</color> larger than your sample size (e.g., for 100 data subjects, <color #ed1c24>N</color> = 1000). Remove all direct identifiers (e.g. names, email addresses, phone numbers) or replace them with these IDs. Use the same ID consistently across related files (e.g., transcripts, survey responses, and other data) to preserve linkability while protecting the identity of your data subjects (Figure 1). First, assign each data subject in your dataset a unique pseudonymization ID. That ID must be unique, non-informative, and non-derivable from personal data. In Excel, you can generate a randomized list of IDs by entering the following formula in the formula bar:  <color #ed1c24>=SORTBY(SEQUENCE(N), RANDARRAY(N))</color>. Here, <color #ed1c24>SEQUENCE(N)</color> generates numbers from 1 to <color #ed1c24>N</color>, and <color #ed1c24>RANDARRAY(N)</color> randomizes their order. Choose an <color #ed1c24>N</color> larger than your sample size (e.g., for 100 data subjects, <color #ed1c24>N</color> = 1000). Remove all direct identifiers (e.g. names, email addresses, phone numbers) or replace them with these IDs. Use the same ID consistently across related files (e.g., transcripts, survey responses, and other data) to preserve linkability while protecting the identity of your data subjects (Figure 1).
Line 31: Line 31:
  
 ==== Creating a keyfile ==== ==== Creating a keyfile ====
-First, create a pseudonymization keyfile (e.g., in Excel or another UG approved tool available in the University Workplace) that assigns each data subject a unique pseudonymization ID. For the creation of the pseudonymization IDs, you can follow the same steps as described in //Pseudonymization without a keyfile//. Direct identifiers such as names, email addresses, or phone numbers should be stored only in this keyfile (Figure 2) and removed from the research dataset. In the research dataset, remove all direct identifiers or replace them with the pseudonymization IDs.  +First, create a pseudonymization keyfile (e.g., in Excel or another UG-approved tool available in the University Workplace) that assigns each data subject a unique pseudonymization ID. For the creation of the pseudonymization IDs, you can follow the same steps as described in //Pseudonymization without a keyfile//. Direct identifiers such as names, email addresses, or phone numbers should be stored only in this keyfile (Figure 2) and removed from the research dataset. In the research dataset, remove all direct identifiers or replace them with the pseudonymization IDs.  
  
 ---- ----