Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
dcc:pdpsol:dataminimization [2026/05/13 10:06] – [Social media data] remove data donation temporarily because we need approval for the tool marlondcc:pdpsol:dataminimization [2026/06/11 08:47] (current) alba
Line 1: Line 1:
 {{indexmenu_n>1}} {{indexmenu_n>1}}
-====== Data Minimization ======+====== Data minimization ======
  
 ===== Introduction ===== ===== Introduction =====
Line 18: Line 18:
 === Take into account the effort of research participation === === Take into account the effort of research participation ===
 Although it is important to consider what personal data you need for your research, it is also important to be mindful of the effort and strain participation may place on your participants. This means you should limit the collection of personal data to what you need for your research. However, you should also respect participants’ time and effort, and avoid designing studies that require participants to take part multiple times due to narrowly defined research questions. This is particularly important when working with vulnerable or hard-to-reach groups. In such cases, it is advisable to design studies that can address several relevant questions at once, thereby maximizing the value of participants’ contributions while minimizing their strain.  Although it is important to consider what personal data you need for your research, it is also important to be mindful of the effort and strain participation may place on your participants. This means you should limit the collection of personal data to what you need for your research. However, you should also respect participants’ time and effort, and avoid designing studies that require participants to take part multiple times due to narrowly defined research questions. This is particularly important when working with vulnerable or hard-to-reach groups. In such cases, it is advisable to design studies that can address several relevant questions at once, thereby maximizing the value of participants’ contributions while minimizing their strain. 
 +
 +A research ethics committee can provide feedback on your study design. When working with participants, an ethics review is usually necessary. Always make sure you are aware of the [[https://www.rug.nl/digital-competence-centre/privacy-and-data-protection/data-protection/ethics-for-data-protection|research ethics procedures]] at your faculty. 
  
 ==== Use consistent file naming and version control ====  ==== Use consistent file naming and version control ==== 
Line 37: Line 39:
 Some data can reveal more information about an individual than others. Only use an extensive or detailed data collection method if you also use this type of data to answer your research question. Some data can reveal more information about an individual than others. Only use an extensive or detailed data collection method if you also use this type of data to answer your research question.
   * **Video**: Observational research focusing on human interactions, facial expressions, movement patterns, etc.    * **Video**: Observational research focusing on human interactions, facial expressions, movement patterns, etc. 
-  * **Audio**: Unstructered qualitative research where precize content and possibly tone and pitch are important (e.g., focus groups and open interviews), but also research conducting speech analysis.    +  * **Audio**: Unstructured qualitative research where precise content and possibly tone and pitch are important (e.g., focus groups and open interviews), but also research conducting speech analysis.    
-  * **Text**: Structured qualitative research focusing on content (e.g. interviews, oberservations)+  * **Text**: Structured qualitative research focusing on content (e.g. interviews, observations)
  
 ===Contact information=== ===Contact information===
Line 70: Line 72:
 ===Metadata=== ===Metadata===
 Photo, video or audio files might contain a timestamp, date and, depending on the equipment and settings, also location. Check whether you can prevent the collection of these data or remove these metadata as soon as possible after collection. [[https://www.comparitech.com/blog/vpn-privacy/exif-metadata-privacy/|Comparitech]] shows an example of EXIF metadata stored with a photo, including the GPS coordinates where the photo was taken and a timestamp of when the photo was taken. These metadata, included in smartphones or digital cameras, can help catalogue photos (e.g. Google Timeline) but can also result in privacy risks in the context of research.  Photo, video or audio files might contain a timestamp, date and, depending on the equipment and settings, also location. Check whether you can prevent the collection of these data or remove these metadata as soon as possible after collection. [[https://www.comparitech.com/blog/vpn-privacy/exif-metadata-privacy/|Comparitech]] shows an example of EXIF metadata stored with a photo, including the GPS coordinates where the photo was taken and a timestamp of when the photo was taken. These metadata, included in smartphones or digital cameras, can help catalogue photos (e.g. Google Timeline) but can also result in privacy risks in the context of research. 
 +
 +=== Digital traces ===
 +Be aware that bringing a device to an interview can, by itself, generate digital traces. If your phone is on, it may record GPS coordinates, Wi-Fi network connections, or mobile tower signals (depending on settings). If two phones are connected to the same tower for a period of time, this can indicate that a meeting took place. Apps or operating systems may collect data in the background (e.g. weather apps, social media or system updates). IP addresses, MAC addresses of nearby devices, or Bluetooth connections can also unintentionally reveal information about the environment or participant. All this information can later be used to determine where the interview took place and with whom. 
 +
 +If you plan on doing interviews with participants, and the topic is (highly) sensitive (e.g. because it is about illegal activities), then you can minimize digital traces during interviews by leaving your phone behind and using a dedicated, offline recording device instead. If you must bring a phone, enable airplane mode and remove the battery if possible. Consider using [[https://myuniversity.rug.nl/infonet/medewerkers/actueel/news/251107-secure-laptop-phone-travel-risk-countries?lang=nl|burner phones]] provided by the University of Groningen.
  
 ---- ----
Line 78: Line 85:
  
 ===Contact information=== ===Contact information===
-Do not collect contact information if you do not plan to contact your participants after you have collected the data (e.g. in case of recruitment via social media, posters or third parties). The [[https://www.rug.nl/digital-competence-centre/it-solutions/collect-and-annotate/qualtrics-surveys?lang=en|UG approved survey tool Qualtrics]] provides the option to use an [[https://www.qualtrics.com/support/survey-platform/distributions-module/web-distribution/anonymous-link/|anonymous link]] to prevent the collection of name and e-mail address of your participants. If you would like to contact participants to share results or for another purpose that doesn’t require linking identities to their responses, set up a separate survey to collect contact information. You can provide a link to this second survey at the end of the original one. This approach ensures that all research data anonymous from the start, while still allowing you to maintain a list of contact details. This only works if there is no need to connect contact information to individual responses.+Do not collect contact information if you do not plan to contact your participants after you have collected the data (e.g. in case of recruitment via social media, posters or third parties). The [[https://www.rug.nl/digital-competence-centre/it-solutions/collect-and-annotate/qualtrics-surveys?lang=en|UG approved survey tool Qualtrics]] provides the option to use an [[https://www.qualtrics.com/support/survey-platform/distributions-module/web-distribution/anonymous-link/|anonymous link]] to prevent the collection of name and e-mail address of your participants. If you would like to contact participants to share results or for another purpose that doesn’t require linking identities to their responses, set up a separate survey to collect contact information. You can provide a link to this second survey at the end of the original one. This approach ensures that all research data is anonymous from the start, while still allowing you to maintain a list of contact details. This only works if there is no need to connect contact information to individual responses.
  
 === Informed Consent === === Informed Consent ===
Line 101: Line 108:
 As a researcher, you can reduce the amount of personal data you collect when conducting social media research by carefully selecting your data collection method. Here are two common research approaches, with practical tips for each: As a researcher, you can reduce the amount of personal data you collect when conducting social media research by carefully selecting your data collection method. Here are two common research approaches, with practical tips for each:
   * **Social media data scraping** is the automated collection of user-generated content and metadata from platforms like X (Formerly Twitter) and YouTube for systematic analysis. Make sure you limit the variables you collect during scraping and define clear filters to your range (e.g. keywords and date range). Consider taking a sample and not scraping all the data that falls within this range.    * **Social media data scraping** is the automated collection of user-generated content and metadata from platforms like X (Formerly Twitter) and YouTube for systematic analysis. Make sure you limit the variables you collect during scraping and define clear filters to your range (e.g. keywords and date range). Consider taking a sample and not scraping all the data that falls within this range. 
-  * **Manual data collection and observation** make it possible to carefully design your data collection and easily prevent the collection of identifiable data. You can determine what data you collect and are less dependent on API. Examples of good practices: 1) Make sure not to collect any usernamesor store them seperately from the rest of your data ([[pseudonymization|pseudonymization]]). 2) [[de-identification|De-identify]] other personal identifiable information that is not necessary for your research purpose during data collection.  +  * **Manual data collection and observation** make it possible to carefully design your data collection and easily prevent the collection of identifiable data. You can determine what data you collect and are less dependent on API. Examples of good practices: 1) Make sure not to collect any usernames or store them separately from the rest of your data ([[pseudonymization|pseudonymization]]). 2) [[de-identification|De-identify]] other personal identifiable information that is not necessary for your research purpose during data collection.