Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
dcc:itsol:whisper:datamanage [2025/09/09 13:17] – Removed VRW reference giulio | dcc:itsol:whisper:datamanage [2025/09/10 13:44] (current) – Minor text changes alba | ||
---|---|---|---|
Line 4: | Line 4: | ||
Because you are handling data containing the **voices of (multiple) people**, your data is considered one of the most sensitive kinds of data. For this reason, data uploaded to HPC to perform the transcription should not be left idling on HPC. As soon as you are aware that the transcription has been performed, you should take steps to **download the results from HPC to your UG work environment and to remove all traces of the data from HPC**. | Because you are handling data containing the **voices of (multiple) people**, your data is considered one of the most sensitive kinds of data. For this reason, data uploaded to HPC to perform the transcription should not be left idling on HPC. As soon as you are aware that the transcription has been performed, you should take steps to **download the results from HPC to your UG work environment and to remove all traces of the data from HPC**. | ||
- | There are **three main areas** that you need to clear in order to secure your data: | + | There are **three main areas** that you need to clear to secure your data: |
* The **audio** that you provided on **input** | * The **audio** that you provided on **input** | ||
* The **transcripts** that Whisper created | * The **transcripts** that Whisper created | ||
Line 11: | Line 11: | ||
Once all three of these locations/ | Once all three of these locations/ | ||
- | Last but not least and independently of the sensitivity of your data, **HPC is a computing cluster** and therefore only intended for the **short-term storage of mutable data**. In order to ensure proper performance, | + | Last but not least, and independently of the sensitivity of your data, **HPC is a computing cluster** and therefore only intended for the **short-term storage of mutable data**. In order to ensure proper performance, |
==== Input Audio ==== | ==== Input Audio ==== | ||
Line 17: | Line 17: | ||
The files contained in the folder '' | The files contained in the folder '' | ||
- | Before removing the audio files, we advise you to first check if the transcripts are acceptable. Should you have to run the transcription again with a modified script (i.e. to force a language Whisper did not automatically identify), then having the audio still on HPC will save you time. | + | Before removing the audio files, we advise you to first check if the transcripts are acceptable. Should you have to run the transcription again with a modified script (i.e., to force a language |
If the transcripts are what you expect them to be, however, then the audio should be removed promptly. Please consider doing a brief check of the transcripts, | If the transcripts are what you expect them to be, however, then the audio should be removed promptly. Please consider doing a brief check of the transcripts, | ||
Line 37: | Line 37: | ||
* '' | * '' | ||
- | This file is created by HPC when you launch a job and it is tagged with the '' | + | This file is created by HPC when you launch a job, and it is tagged with the '' |
**Note**: If you were curious, SLURM stands for //Simple Linux Utility for Resource Management// | **Note**: If you were curious, SLURM stands for //Simple Linux Utility for Resource Management// | ||
[[dcc: | [[dcc: |