Building the scripts
We will run Whisper using a script in order to facilitate the use of the tool. Follow the steps here to set up the script and run it, read the next section of the guide to learn more about the content of the script file itself.
In order to run the script, you will first have to create it. Open your text editor of choice and copy the highlighted code below into the new file. Save the file with the name: whisper_runall.sh
.
#!/bin/bash
#SBATCH --time=08:00:00
#SBATCH --gpus-per-node=1
#SBATCH --mem=16000
module load PyTorch/1.12.1-foss-2022a-CUDA-11.7.0
source $HOME/.envs/whisper/bin/activate
whisper $HOME/whisper_audio/* --model large-v2 --output_dir $HOME/whisper_output/
The example below uses the vi
text editor found in HPC to create the script. Follow the instructions in the figures to create the script using this specific text editor:
- Double-check that the content of the script is correct. If it is, it should look exactly like in the picture below.
- Finally, to save the file and exit from
vi
, first pressesc
on your keyboard. Then type:wq
directly on your keyboard. The input should be displayed at the bottom of the terminal like shown in the figure. Press enter to commit the command. The punctuation is to let the editor know that a command is coming, thew
stands for “write”, while theq
stands for “quit”.
Content of the batch script
The batch script you created is the starting point for all your jobs relating to Whisper. Below is a brief explanation of the different lines present in the file. Please read the next steps carefully if you wish to modify the content of the script. For convenience's sake, also make sure to always run the script through sbatch
, rather than run the steps separately by hand.
Specialized scripts
The script described above is a general use script. It relies on Whisper to make most of the decisions regarding the transcription. If you need to be more strict on what the program is allowed to do, you might want to use one of the scripts listed below.
It is good practice to create different scripts for different tasks, instead of modifying the same script based on your needs. In this way, you don't have to modify the script again, if you want to execute a task that you already created in the past. This practice helps you keep order and is less prone to errors.
Forced English
This script forces Whisper to transcribe the audio into English. Use this script if the automatic language detection results in the wrong language (i.e. a strong English accent being recognized as Welsh, instead of English). The same concept works for other supported languages, for example Dutch. To change which language is forced, simply substitute the string English
with the desired language behind the -- language
command.
When you save the script, you can call it whisper_forcedEnglish.sh
. If you forced a different language, we advise you to label it accordingly. To execute it, simply type into the terminal sbatch whisper_forcedEnglish.sh
and follow the same steps as the general script (see here).
Translate instead of transcribe
Whisper is also capable of translating any X language into English. To let the program know that you wish to see a translation instead of a transcription, you need to specify which -- task
the program needs to perform. The script below is already edited to perform a translation. Please keep in mind that the transcript will only be translated then, and the original text will not be displayed in the output files. If you need to have the original as a means of comparison, you can either first run the general script on the audio, or you can run a forced language script (see above) before you run the translation.
When you save the script, you can call it whisper_translate.sh
. To execute it, simply type into the terminal sbatch whisper_translate.sh
and follow the same steps as the general script (see here).
Note: Regardless of whether you run the transcription or the translation first, the file names of the output files will be the exact same. In order for the second operation (translation or transcription) to not overwrite the first, you need to rename the output files before you run the second operation. In this way, the output of your first operation will remain untouched by the second operation.