This is an old revision of the document!
Data Minimization
Introduction
Data minimization is one of the data protection principles that form the basis of the GDPR. It states that the processing of personal data should be “adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed” (GDPR art. 5 (1c)). Data minimization does not mean that you cannot collect personal data at all. If you can explain why you need these data for the current or specific future purposes you are allowed to collect these data.
When designing your research, it is important to consider the personal data required to answer your research questions, as well as the level of detail needed and any data that may be collected automatically due to your chosen method. The data minimisation practices introduced below will help you to implement data minimisation in your own research.
General data minimization practices
Data minimization through generalization
In all types of research, it is important to consider the level of detail of the variables you selected.
Collecting demographics about your research participants is important in order to investigate whether certain groups are represented in your sample or behave differently, and to correct for bias. However, to investigate this, it is not necessary to collect highly detailed data. This means that you could collect age (or age group) instead of birth date, and categorize education in groups. Make sure that you use categories that are compatible with the sources that you would like to compare them with. Statistics Netherlands published several classifications variables, such as education and occupation.
This concept is also relevant if you use certain variables as an independent variable in your research. When you want to collect location data, it is often unnecessary to know someone’s exact address or neighborhood in order to answer a research question. For example, if the goal is to compare happiness within different regions in a country, broader categories such as rural versus urban areas may be sufficient. However, in some situations, it might be necessary to collect more detailed or high granular data. For example, if the research is about neighborhood connections, detailed location data would be necessary.
Take into account the effort of research participation
Although it is important to consider what personal data you need for your research, it is also important to be mindful of the effort and strain participation may place on data subjects. This means you should limit the collection of personal data to what you need for your research. However, you should also respect participants’ time and effort, and avoid designing studies that require participants to take part multiple times due to narrowly defined research questions. This is particularly important when working with vulnerable or hard-to-reach groups. In such cases, it is advisable to design studies that can address several relevant questions at once, thereby maximizing the value of participants’ contributions while minimizing their strain.