Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
dcc:pdpsol:dataminimization [2026/03/23 13:51] marlondcc:pdpsol:dataminimization [2026/04/29 13:52] (current) – add comment solveig about contact information in surverys marlon
Line 3: Line 3:
  
 ===== Introduction ===== ===== Introduction =====
-Data minimization is one of the data protection principles that form the basis of the GDPR. It states that the processing of personal data should be //“adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed”// ([[https://gdpr.eu/article-5-how-to-process-personal-data/|GDPR art. 5 (1c)]]). Data minimization does not mean that you cannot collect personal data at all. If you can explain why you need these data for the current or specific future purposes you are allowed to collect these data.+**Data minimization is one of the data protection principles that form the basis of the GDPR. It states that the processing of personal data should be //“adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed”// ([[https://gdpr.eu/article-5-how-to-process-personal-data/|GDPR art. 5 (1c)]]). Data minimization does not mean that you cannot collect personal data at all. If you can explain why you need these data for the current or specific future purposes you are allowed to collect these data. 
 +**
  
 When designing your research, it is important to consider the personal data required to answer your research questions, as well as the level of detail needed and any data that may be collected automatically due to your chosen method. The data minimization practices introduced below will help you to implement data minimization in your own research. When designing your research, it is important to consider the personal data required to answer your research questions, as well as the level of detail needed and any data that may be collected automatically due to your chosen method. The data minimization practices introduced below will help you to implement data minimization in your own research.
Line 12: Line 13:
 In all types of research, it is important to consider the level of detail of the variables you selected.  In all types of research, it is important to consider the level of detail of the variables you selected. 
  
-Collecting **demographics** about your research participants is important in order to investigate whether certain groups are represented in your sample or behave differently, and to correct for bias. However, to investigate this, it is not necessary to collect highly detailed data. This means that you could collect age (or age group) instead of birth date, and categorize education in groups. Make sure that you use categories that are compatible with the sources that you would like to compare them with. Statistics Netherlands published [[https://www.cbs.nl/nl-nl/onze-diensten/methoden/classificaties|several classifications variables]], such as education and occupation. +Collecting **demographics** about your research participants is important to investigate whether certain groups are represented in your sample or behave differently, and to correct for bias. However, to investigate this, it is not necessary to collect highly detailed data. This means that you could collect age (or age group) instead of birth date, and categorize education in groups. Make sure that you use categories that are compatible with the sources that you would like to compare them with. Statistics Netherlands published [[https://www.cbs.nl/nl-nl/onze-diensten/methoden/classificaties|several classifications variables]], such as education and occupation. 
  
-This concept is also relevant if you use certain variables as an **independent variable** in your research. For example, if you want to collect location data, it is often unnecessary to know someone’s exact address or neighborhood in order to answer a research question. For example, if the goal is to compare happiness within different regions in a country, broader categories such as rural versus urban areas may be sufficient. However, in some situations, it might be necessary to collect more detailed or high granular data. For example, if the research is about neighborhood connections, detailed location data would be necessary. +This concept is also relevant if you use certain variables as an **independent variable** in your research. For example, if you want to collect location data, it is often unnecessary to know someone’s exact address or neighbourhood to answer a research question. For example, if the goal is to compare happiness within different regions in a country, broader categories such as rural versus urban areas may be sufficient. However, in some situations, it might be necessary to collect more detailed or high-granularity data. For example, if the research is about neighbourhood connections, detailed location data would be necessary. 
  
-==== Take into account the effort of research participation ==== +=== Take into account the effort of research participation === 
-Although it is important to consider what personal data you need for your research, it is also important to be mindful of the effort and strain participation may place on data subjects. This means you should limit the collection of personal data to what you need for your research. However, you should also respect participants’ time and effort, and avoid designing studies that require participants to take part multiple times due to narrowly defined research questions. This is particularly important when working with vulnerable or hard-to-reach groups. In such cases, it is advisable to design studies that can address several relevant questions at once, thereby maximizing the value of participants’ contributions while minimizing their strain. +Although it is important to consider what personal data you need for your research, it is also important to be mindful of the effort and strain participation may place on your participants. This means you should limit the collection of personal data to what you need for your research. However, you should also respect participants’ time and effort, and avoid designing studies that require participants to take part multiple times due to narrowly defined research questions. This is particularly important when working with vulnerable or hard-to-reach groups. In such cases, it is advisable to design studies that can address several relevant questions at once, thereby maximizing the value of participants’ contributions while minimizing their strain. 
  
 ==== Use consistent file naming and version control ====  ==== Use consistent file naming and version control ==== 
 Organize your data consistently by using a file naming strategy and good folder structure. The [[https://dmeg.cessda.eu/Data-Management-Expert-Guide/2.-Organise-Document/File-naming-and-folder-structure|practical guidelines of CESSDA]] can guide you in designing your file naming and folder structure strategy, but at least keep in mind the following points: Organize your data consistently by using a file naming strategy and good folder structure. The [[https://dmeg.cessda.eu/Data-Management-Expert-Guide/2.-Organise-Document/File-naming-and-folder-structure|practical guidelines of CESSDA]] can guide you in designing your file naming and folder structure strategy, but at least keep in mind the following points:
-  * Do not include contact information or other personal data in the naming of your files.  +  * Do not include contact informationother (parts of) personal data or any information relating to your participants in the naming of your files.  
-  * Add version numbers to your file names to easily track of the different versions of processed data you are storing. +  * Add version numbers to your file names to easily keep track of the different versions of processed data you are storing. 
   * It is good practice to create a version control table to keep track of different versions. The version control table can include information on different version numbers, authors, notes, and when the file was last updated. The table can also include a summary of the differences between the current version and previous versions. The version control table can be an independent text file, or it can be included at the top of your document, scripts, or other files.    * It is good practice to create a version control table to keep track of different versions. The version control table can include information on different version numbers, authors, notes, and when the file was last updated. The table can also include a summary of the differences between the current version and previous versions. The version control table can be an independent text file, or it can be included at the top of your document, scripts, or other files. 
-  * Using a version control table in combination with consistent file naming allows you to securely delete of versions of your data with identifiable information about your participants as soon as these are no longer required for your research. +  * Using a version control table in combination with consistent file naming allows you to securely delete versions of your data with identifiable information about your participants as soon as these are no longer required for your research. 
   * Refer to the DCC website for more information on [[https://www.rug.nl/digital-competence-centre/it-solutions/it-security/backup-versioning|version control]].   * Refer to the DCC website for more information on [[https://www.rug.nl/digital-competence-centre/it-solutions/it-security/backup-versioning|version control]].
  
-**Table 2: Example of a version control table ** +**Table 1: Example of a version control table **
 ---- ----
 {{:dcc:pdpsol:dataminimization:version_control2.png?direct&600|}} {{:dcc:pdpsol:dataminimization:version_control2.png?direct&600|}}
  
  
- +===== Research-specific data minimization practices =====
- +
-===== Research specific data minimization practices =====+
 ====Interviews, focus groups or observations ==== ====Interviews, focus groups or observations ====
 ===Type of data=== ===Type of data===
-Some data can reveal more information about an individual than others. Only use an extensive or detailed data collection methodif you also use this type of data to answer your research question. +Some data can reveal more information about an individual than others. Only use an extensive or detailed data collection method if you also use this type of data to answer your research question. 
-  * **Video**: Observational research, facial expressions, movement patterns +  * **Video**: Observational research focusing on human interactions, facial expressions, movement patterns, etc.  
-  * **Audio**: Focus groups, open interviews, speech analysis +  * **Audio**: Unstructered qualitative research where precize content and possibly tone and pitch are important (e.g.focus groups and open interviews)but also research conducting speech analysis.    
-  * **Text**: Structured interviews+  * **Text**: Structured qualitative research focusing on content (e.g. interviews, oberservations)
  
 ===Contact information=== ===Contact information===
Line 49: Line 47:
 Informed consent can reveal personal information about your participants. Minimize the amount of personal data on your consent form and plan to handle consent registration with care. Follow the practical guidelines on the DCC website about [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/gdpr-research/informed-consent|informed consent]] to guide you in the process and keep in mind the data minimization tips below: Informed consent can reveal personal information about your participants. Minimize the amount of personal data on your consent form and plan to handle consent registration with care. Follow the practical guidelines on the DCC website about [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/gdpr-research/informed-consent|informed consent]] to guide you in the process and keep in mind the data minimization tips below:
    
-++++ Informed consent on paper | +++++ (Click) Informed consent on paper | 
-If you are conducting interviews or experiments, it is common practice to ask for consent on paper. Make sure to follow the faculty and university guidelines with regard to the design of your consent form.+If you are conducting interviews or experiments, it is common practice to ask for consent on paper. Make sure to follow your [[https://www.rug.nl/research/research-support-portal/before/ethics/|faculty guidelines]] or [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/gdpr-research/informed-consent|university guidelines]] about the design of your consent procedure.
  
 When asking for consent, ensure you collect only the personal data that is necessary: When asking for consent, ensure you collect only the personal data that is necessary:
   * If your objective is to collect anonymous data, do not ask for names, and signatures and do not use pseudonymization IDs in consent forms.   * If your objective is to collect anonymous data, do not ask for names, and signatures and do not use pseudonymization IDs in consent forms.
-  * If your objective is to collect (pseudonymized) personal data, do not ask for names, signatures on the consent form. Instead, use a pseudonymization ID in consent forms to prevent direct identification. Ensure this pseudonymization ID corresponds with name and/or contact details in a keyfile. At the relevant time in the project, remove the link between the consent form and the research data and the participant’s identity reported on the keyfile. For example, when you've started to analyze the data and the participants can no longer request their data to be removed (right to withdraw consent), as stated in the consent form. After the link between the pseudonymization ID and the identity of the participant have been removed, the consent forms can be considered anonymous.+  * If your objective is to collect (pseudonymized) personal data, do not ask for names, signatures on the consent form. Instead, use a [[pseudonymization|pseudonymization ID]] in consent forms to prevent direct identification. Ensure this pseudonymization ID corresponds with the name and/or contact details in a keyfile. At the relevant time in the project, remove the link between the consent form and the research data and the participant’s identity reported on the keyfile. For example, when you've started to analyze the data and the participants can no longer request their data to be removed (right to withdraw consent), as stated in the consent form. After the link between the pseudonymization ID and the identity of the participant has been removed, the consent forms can be considered anonymous.
  
 After you finish your research: After you finish your research:
Line 62: Line 60:
 ++++ ++++
  
-++++ Informed consent on audio | +++++ (Click) Informed consent on audio | 
-If you are conducting interviews, it is sometimes necessary to ask consent during the interview itself. Make sure to follow the faculty and university guidelines with regard to the design of your consent procedure.+If you are conducting interviews, it is sometimes necessary to ask consent during the interview itself. Make sure to follow your [[https://www.rug.nl/research/research-support-portal/before/ethics/|faculty guidelines]] or [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/gdpr-research/informed-consent|university guidelines]] about the design of your consent procedure.
  
   * Be aware that audio or video recordings of informed consent cannot be fully anonymized without altering their content;   * Be aware that audio or video recordings of informed consent cannot be fully anonymized without altering their content;
Line 72: Line 70:
 ++++ ++++
 ===Metadata=== ===Metadata===
-Photo, video or audio files might contain a timestamp, date and depending on the equipment and settings also location. Check whether you can prevent the collection of these data or remove these metadata as soon as possible after collection. [[https://www.comparitech.com/blog/vpn-privacy/exif-metadata-privacy/|Comparitech]] shows an example of EXIF metadata stored with a photo, including the GPS coordinates where the photo was taken and a timestamp of when the photo was taken. These metadata, included in smartphones or digital cameras, can help catalogue photos (e.g. Google Timeline) but can also result in privacy risks in the context of research. +Photo, video or audio files might contain a timestamp, date anddepending on the equipment and settingsalso location. Check whether you can prevent the collection of these data or remove these metadata as soon as possible after collection. [[https://www.comparitech.com/blog/vpn-privacy/exif-metadata-privacy/|Comparitech]] shows an example of EXIF metadata stored with a photo, including the GPS coordinates where the photo was taken and a timestamp of when the photo was taken. These metadata, included in smartphones or digital cameras, can help catalogue photos (e.g. Google Timeline) but can also result in privacy risks in the context of research. 
  
 ---- ----
Line 81: Line 79:
  
 ===Contact information=== ===Contact information===
-Do not collect contact information if you do not plan to contact your participants after you collected the data (e.g. in case of recruitment via social media, posters or third parties). The [[https://www.rug.nl/digital-competence-centre/it-solutions/collect-and-annotate/qualtrics-surveys?lang=en|UG approved survey tool Qualtrics]] provides the option to use an [[https://www.qualtrics.com/support/survey-platform/distributions-module/web-distribution/anonymous-link/|anonymous link]] to prevent the collection of name and e-mail address of your participants. +Do not collect contact information if you do not plan to contact your participants after you have collected the data (e.g. in case of recruitment via social media, posters or third parties). The [[https://www.rug.nl/digital-competence-centre/it-solutions/collect-and-annotate/qualtrics-surveys?lang=en|UG approved survey tool Qualtrics]] provides the option to use an [[https://www.qualtrics.com/support/survey-platform/distributions-module/web-distribution/anonymous-link/|anonymous link]] to prevent the collection of name and e-mail address of your participants. If you would like to contact participants to share results or for another purpose that doesn’t require linking identities to their responses, set up a separate survey to collect contact information. You can provide a link to this second survey at the end of the original one. This approach ensures that all research data anonymous from the start, while still allowing you to maintain a list of contact details. This only works if there is no need to connect contact information to individual responses.
  
 === Informed Consent === === Informed Consent ===
 Informed consent can reveal personal information about your participants. Minimize the amount of personal data on your consent form and plan to handle consent registration with care. Follow the practical guidelines on the DCC website about [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/gdpr-research/informed-consent|informed consent]] to guide you in the process, and keep in mind the data minimization tips below: Informed consent can reveal personal information about your participants. Minimize the amount of personal data on your consent form and plan to handle consent registration with care. Follow the practical guidelines on the DCC website about [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/gdpr-research/informed-consent|informed consent]] to guide you in the process, and keep in mind the data minimization tips below:
  
-++++ Informed consent via an online platform | +++++ (Click) Informed consent via an online platform | 
-If you are conducting questionnaire research via an online platform (e.g., Qualtrics), you can ask consent via a question in the platform itself. Make sure to follow the faculty and university guidelines with regard to the design of your consent form. Participants’ progression to the next page can be considered as consent.+If you are conducting questionnaire research via an online platform (e.g., Qualtrics), you can ask consent via a question in the platform itself. Make sure to follow your [[https://www.rug.nl/research/research-support-portal/before/ethics/|faculty guidelines]] or [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/gdpr-research/informed-consent|university guidelines]] about the design of your consent procedure. Participants’ progression to the next page can be considered as consent.
  
 When asking for consent, ensure you collect only the personal data that is necessary: When asking for consent, ensure you collect only the personal data that is necessary:
  
   * If your objective is to collect anonymous or de-identified data, do not ask for names or other contact details for consent registration purposes.   * If your objective is to collect anonymous or de-identified data, do not ask for names or other contact details for consent registration purposes.
-  * If your objective is to collect identifiable or sensitive personal data, use a pseudonymization ID to prevent direct identification. At the relevant time in the project, remove the link between the consent and the participant’s identity reported in your keyfile. For example, when you've started to analyze the data and the participants can no longer request their data to be removed (right to withdraw consent), as stated in the consent form, or after you connected these data to other data (e.g. interview data).+  * If your objective is to collect identifiable or sensitive personal data, use a [[pseudonymization|pseudonymization ID]] to prevent direct identification. At the relevant time in the project, remove the link between the consent and the participant’s identity reported in your keyfile. For example, when you've started to analyze the data and the participants can no longer request their data to be removed (right to withdraw consent), as stated in the consent form, or after you connected these data to other data (e.g. interview data).
 ++++ ++++
  
Line 103: Line 101:
 ===Data collection method=== ===Data collection method===
 As a researcher, you can reduce the amount of personal data you collect when conducting social media research by carefully selecting your data collection method. Here are three common research approaches, with practical tips for each: As a researcher, you can reduce the amount of personal data you collect when conducting social media research by carefully selecting your data collection method. Here are three common research approaches, with practical tips for each:
-  * **Social media data scraping** is the automated collection of user-generated content and metadata from platforms like X (Twitter) and Youtube for systematic analysis. Make sure you limit the variables you collect during scraping and define clear filters to your range (e.g. keywords and date range). Consider taking a sample and not scraping all the data that falls within this range. +  * **Social media data scraping** is the automated collection of user-generated content and metadata from platforms like X (Formerly Twitter) and YouTube for systematic analysis. Make sure you limit the variables you collect during scraping and define clear filters to your range (e.g. keywords and date range). Consider taking a sample and not scraping all the data that falls within this range. 
   * **[[https://datadonation.eu/data-donation/|Data donation]]** allows a researcher to collect digital trace data, by asking their participants to request and share their Data Download Packages (DDPs), which they can request by exercising their [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/gdpr-research/rights-of-human-data-subjects-in-scientific-research|privacy right to access and data portability]]. Although these packages can contain a lot of sensitive data, researchers at Scientific institutions in the Netherlands can use the software [[https://datadonation.eu/software/port/|Port]] which helps to set up a [[https://d3i-infra.github.io/data-donation-task/|data donation task]]. This limits the amount of data that will be donated to the data that is necessary for the research project, because participants do not donate the full DDP they received from the Social Media Platform.    * **[[https://datadonation.eu/data-donation/|Data donation]]** allows a researcher to collect digital trace data, by asking their participants to request and share their Data Download Packages (DDPs), which they can request by exercising their [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/gdpr-research/rights-of-human-data-subjects-in-scientific-research|privacy right to access and data portability]]. Although these packages can contain a lot of sensitive data, researchers at Scientific institutions in the Netherlands can use the software [[https://datadonation.eu/software/port/|Port]] which helps to set up a [[https://d3i-infra.github.io/data-donation-task/|data donation task]]. This limits the amount of data that will be donated to the data that is necessary for the research project, because participants do not donate the full DDP they received from the Social Media Platform. 
-  * **Manual data collection and observation** make it possible to already anonymously or pseudonymously collect certain data. You can determine what data you collect and are less dependent on API or Data Download Packages (DDPs). +  * **Manual data collection and observation** make it possible to carefully design your data collection and easily prevent the collection of identifiable data. You can determine what data you collect and are less dependent on API or Data Download Packages (DDPs). Examples of good practices: 1) Make sure not to collect any usernames, or store them seperately from the rest of your data ([[pseudonymization|pseudonymization]]). 2) [[de-identification|De-identify]] other personal identifiable information that is not necessary for your research purpose during data collection.  
  
  
 ===Contact information=== ===Contact information===
-If you are [[https://www.rug.nl/digital-competence-centre/guides-faq/checklist-social-media-data.pdf|scraping or manually collecting data from social media platforms]], you might not directly collect contact information. However, posts are often accompanied by social media ID and post ID. This information is easy to trace back to an individual. If you do not need this information for current or future research (e.g. connect to other datasets), delete these IDs from your dataset or consider pseudonymization.+If you are [[https://www.rug.nl/digital-competence-centre/research-data/collect-process-and-store/using-social-media-data| collecting data from social media platforms]], you might not directly collect contact information. However, posts are often accompanied by social media ID and post ID. This information is easy to trace back to an individual. If you do not need this information for current or future research (e.g. connect to other datasets), delete these IDs from your dataset or consider [[pseudonymization|pseudonymization]].
  
 ===Metadata=== ===Metadata===
-Photo, video or audio files might contain a timestamp, date and depending on the equipment and settings also location. Check whether you can prevent the collection of these data or remove these metadata as soon as possible after collection. [[https://www.comparitech.com/blog/vpn-privacy/exif-metadata-privacy/|Comparitech]] shows an example of EXIF metadata stored with a photo, including the GPS coordinates where the photo was taken and a timestamp of when the photo was taken. These metadata, included in smartphones or digital cameras, can help catalogue photos (e.g. google timeline) but can also result in privacy risks in the context of research.+Photo, video or audio files might contain a timestamp, date anddepending on the equipment and settingsalso location. Check whether you can prevent the collection of these data or remove these metadata as soon as possible after collection. [[https://www.comparitech.com/blog/vpn-privacy/exif-metadata-privacy/|Comparitech]] shows an example of EXIF metadata stored with a photo, including the GPS coordinates where the photo was taken and a timestamp of when the photo was taken. These metadata, included in smartphones or digital cameras, can help catalogue photos (e.g. Google Timeline) but can also result in privacy risks in the context of research.
  
 +----
 [[dcc:pdpsol:start | → Go back to the Privacy & Data protection home page]] [[dcc:pdpsol:start | → Go back to the Privacy & Data protection home page]]