Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
dcc:pdpsol:dataminimization [2026/01/29 13:44] – [Data collection specific data minimization practices] marlondcc:pdpsol:dataminimization [2026/01/29 14:46] (current) marlon
Line 1: Line 1:
 ====== Data Minimization ====== ====== Data Minimization ======
 ===== Introduction ===== ===== Introduction =====
-Data minimization is one of the data protection principles that form the basis of the GDPR. It states that the processing of personal data should be “adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed” (GDPR art. 5 (1c)). Data minimization does not mean that you cannot collect personal data at all. If you can explain why you need these data for the current or specific future purposes you are allowed to collect these data.+Data minimization is one of the data protection principles that form the basis of the GDPR. It states that the processing of personal data should be “adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed” ([[https://gdpr.eu/article-5-how-to-process-personal-data/|GDPR art. 5 (1c)]]). Data minimization does not mean that you cannot collect personal data at all. If you can explain why you need these data for the current or specific future purposes you are allowed to collect these data.
  
-When designing your research, it is important to consider the personal data required to answer your research questions, as well as the level of detail needed and any data that may be collected automatically due to your chosen method. The data minimisation practices introduced below will help you to implement data minimisation in your own research.+When designing your research, it is important to consider the personal data required to answer your research questions, as well as the level of detail needed and any data that may be collected automatically due to your chosen method. The data minimization practices introduced below will help you to implement data minimization in your own research.
  
  
Line 10: Line 10:
 In all types of research, it is important to consider the level of detail of the variables you selected.  In all types of research, it is important to consider the level of detail of the variables you selected. 
  
-Collecting demographics about your research participants is important in order to investigate whether certain groups are represented in your sample or behave differently, and to correct for bias. However, to investigate this, it is not necessary to collect highly detailed data. This means that you could collect age (or age group) instead of birth date, and categorize education in groups. Make sure that you use categories that are compatible with the sources that you would like to compare them with. Statistics Netherlands published several classifications variables, such as education and occupation. +Collecting **demographics** about your research participants is important in order to investigate whether certain groups are represented in your sample or behave differently, and to correct for bias. However, to investigate this, it is not necessary to collect highly detailed data. This means that you could collect age (or age group) instead of birth date, and categorize education in groups. Make sure that you use categories that are compatible with the sources that you would like to compare them with. Statistics Netherlands published [[https://www.cbs.nl/nl-nl/onze-diensten/methoden/classificaties|several classifications variables]], such as education and occupation. 
  
-This concept is also relevant if you use certain variables as an independent variable in your research. When you want to collect location data, it is often unnecessary to know someone’s exact address or neighborhood in order to answer a research question. For example, if the goal is to compare happiness within different regions in a country, broader categories such as rural versus urban areas may be sufficient. However, in some situations, it might be necessary to collect more detailed or high granular data. For example, if the research is about neighborhood connections, detailed location data would be necessary. +This concept is also relevant if you use certain variables as an **independent variable** in your research. When you want to collect location data, it is often unnecessary to know someone’s exact address or neighborhood in order to answer a research question. For example, if the goal is to compare happiness within different regions in a country, broader categories such as rural versus urban areas may be sufficient. However, in some situations, it might be necessary to collect more detailed or high granular data. For example, if the research is about neighborhood connections, detailed location data would be necessary. 
  
 ==== Take into account the effort of research participation ==== ==== Take into account the effort of research participation ====
Line 19: Line 19:
  
 ===== Research specific data minimization practices ===== ===== Research specific data minimization practices =====
-====Interviews, focus groups or observations (accordion item)==== +====Interviews, focus groups or observations ==== 
-**Type of data**+===Type of data===
 Some data can reveal more information about an individual than others. Only use an extensive or detailed data collection method, if you also use this type of data to answer your research question. Some data can reveal more information about an individual than others. Only use an extensive or detailed data collection method, if you also use this type of data to answer your research question.
 +  * **Video**: Observational research, facial expressions, movement patterns
 +  * **Audio**: Focus groups, open interviews, speech analysis
 +  * **Text**: Structured interviews
  
-**Table 1: minimizing data collection via your research collection method**+===Contact information=== 
 +Be aware that through online calendar invitations or online interviews personal data about data subjects might be visible to others. [[..:itsol:kaltura:gmeet|Enhance the security of your (online) interviews]] by setting appointments in ‘private’ mode, and share video call-links by email.
  
 +===Metadata===
 +Photo, video or audio files might contain a timestamp, date and depending on the equipment and settings also location. Check whether you can prevent the collection of these data or remove these metadata as soon as possible after collection. [[https://www.comparitech.com/blog/vpn-privacy/exif-metadata-privacy/|Comparitech]] shows an example of EXIF metadata stored with a photo, including the GPS coordinates where the photo was taken and a timestamp of when the photo was taken. These metadata, included in smartphones or digital cameras, can help catalogue photos (e.g. Google Timeline) but can also result in privacy risks in the context of research. 
  
-**Contact information** +==== Online survey or questionnaire research ==== 
-Be aware that through online calendar invitations or online interviews personal data about data subjects might be visible to others. Enhance the security of your (online) interviews by setting appointments in ‘private’ mode, and share video call-links by email. +===Type of data===
- +
-**Metadata** +
-Photo, video or audio files might contain a timestamp, date and depending on the equipment and settings also location. Check whether you can prevent the collection of these data or remove these metadata as soon as possible after collection. Comparitech shows an example of EXIF metadata stored with a photo, including the GPS coordinates where the photo was taken and a timestamp of when the photo was taken. These metadata, included in smartphones or digital cameras, can help catalogue photos (e.g. Google Timeline) but can also result in privacy risks in the context of research.  +
-Online survey or questionnaire research (accordion item) +
-Type of data+
 Participants often share more information than necessary when asked open-ended questions. If possible, provide predefined options instead. For example, asking “Where are you from?” may result in participants revealing their home address or city when only their country of residence is required. Providing examples or limiting responses to a question reduces the amount of personal data collected.  Participants often share more information than necessary when asked open-ended questions. If possible, provide predefined options instead. For example, asking “Where are you from?” may result in participants revealing their home address or city when only their country of residence is required. Providing examples or limiting responses to a question reduces the amount of personal data collected. 
  
-**Contact information** +===Contact information=== 
-Do not collect contact information if you do not plan to contact your participants after you collected the data (e.g. in case of recruitment via social media, posters or third parties). The UG approved survey tool Qualtrics provides the option to use an anonymous link to prevent the collection of name and e-mail address of your participants. +Do not collect contact information if you do not plan to contact your participants after you collected the data (e.g. in case of recruitment via social media, posters or third parties). The [[https://www.rug.nl/digital-competence-centre/it-solutions/collect-and-annotate/qualtrics-surveys?lang=en|UG approved survey tool Qualtrics]] provides the option to use an [[https://www.qualtrics.com/support/survey-platform/distributions-module/web-distribution/anonymous-link/|anonymous link]] to prevent the collection of name and e-mail address of your participants. 
  
-**Metadata** +===Metadata=== 
-Online (survey) tools sometimes automatically register personal data, such as IP addresses. Check whether it is necessary and possible to turn off automatic data collection in your online data collection tool. Counterintuitively, when using an anonymous link, Qualtrics still automatically registers IP addresses, which can reveal someone’s location and identity. If you are not using these IP addresses for your research, make sure to enable Anonymize Responses in the survey options as well.   +Online (survey) tools sometimes automatically register personal data, such as IP addresses. Check whether it is necessary and possible to turn off automatic data collection in your online data collection tool. Counterintuitively, when using an anonymous link, Qualtrics still automatically registers IP addresses, which can reveal someone’s location and identity. If you are not using these IP addresses for your research, make sure to enable [[https://www.qualtrics.com/support/survey-platform/survey-module/survey-options/survey-protection/#AnonymizingResponses|Anonymize Responses]] in the survey options as well.   
  
 ==== Social media data ==== ==== Social media data ====
-**Type of data**+===Type of data=== 
  
 +===Contact information===
 +If you are [[https://www.rug.nl/digital-competence-centre/guides-faq/checklist-social-media-data.pdf|scraping or manually collecting data from social media platforms]], you might not directly collect contact information. However, posts are often accompanied by social media ID and post ID. This information is very easy to trace back to an individual. If you do not need this information for current or future research (e.g. connect to other datasets), delete these IDs from your dataset or consider pseudonymization.
  
-**Contact information** +===Metadata=== 
-If you are scraping or manually collecting data from social media platforms, you might not directly collect contact informationHoweverposts are often accompanied by social media ID and post IDThis information is very easy to trace back to an individual. If you do not need this information for current or future research (e.g. connect to other datasets), delete these IDs from your dataset or consider pseudonymization.+Photo, video or audio files might contain a timestampdate and depending on the equipment and settings also location. Check whether you can prevent the collection of these data or remove these metadata as soon as possible after collection[[https://www.comparitech.com/blog/vpn-privacy/exif-metadata-privacy/|Comparitech]] shows an example of EXIF metadata stored with a photoincluding the GPS coordinates where the photo was taken and a timestamp of when the photo was takenThese metadata, included in smartphones or digital cameras, can help catalogue photos (e.g. google timelinebut can also result in privacy risks in the context of research.
  
-**Metadata** +[[dcc:pdpsol:start | → Go back to the Privacy & Data protection home page]]
-Photo, video or audio files might contain a timestamp, date and depending on the equipment and settings also location. Check whether you can prevent the collection of these data or remove these metadata as soon as possible after collection. Comparitech shows an example of EXIF metadata stored with a photo, including the GPS coordinates where the photo was taken and a timestamp of when the photo was taken. These metadata, included in smartphones or digital cameras, can help catalogue photos (e.g. google timeline) but can also result in privacy risks in the context of research.+