This is an old revision of the document!


Pseudonymization

“Pseudonymization is a de-identification procedure during which personally identifiable information is replaced by a unique alias or code (pseudonym). In some situations, the researcher maintains the link between the unique code and the data subject in a keyfile, while in other projects this connection is not necessary.”

Make sure that you describe the process of pseudonymization in a protocol. Determine whether it is necessary to maintain a link between your research data and the individuals involved. You might want to keep a link during longitudinal research and while participants can exercise their rights.

You can use the examples below to support you in writing a protocol.

This is the simplest form of pseudonymization, where direct identifiers are either removed or replaced by a pseudonymization ID. It can be applied to direct identifiers in a single file or used to connect multiple files.

First, assign each data subject in your dataset a unique pseudonymization ID. That ID must be unique, non-informative, and non-derivable from personal data. In Excel, you can generate a randomized list of IDs by entering the following formula in the formula bar: =SORTBY(SEQUENCE(N), RANDARRAY(N)). Here, SEQUENCE(N) generates numbers from 1 to N, and RANDARRAY(N) randomizes their order. Choose an N larger than your sample size (e.g., for 100 data subjects, N = 1000). Remove all direct identifiers (e.g. names, email addresses, phone numbers) or replace them with these IDs. Use the same ID consistently across related files (e.g., transcripts, survey responses, and other data) to preserve linkability while protecting the identity of your data subjects (Figure 2).

Warning: Pseudonymization does not equal anonymization. Although all direct identifiers have been removed or replaced, it might still be possible to re-identify some data subjects in your data because, in combination, certain attributes (e.g., combination of height, job occupation and location of data collection) may single out an individual.

→ Go back to the Privacy & Data protection home page