dcc:itsol:whisper:scripts

Building the scripts
- Content of the batch script
- Specialized scripts

Building the scripts

We will run Whisper using a script in order to facilitate the use of the tool. Follow the steps here to set up the script and run it, read the next section of the guide to learn more about the content of the script file itself.

In order to run the script, you will first have to create it. Open your text editor of choice and copy the highlighted code below into the new file. Save the file with the name: whisper_runall.sh.

Note: The PyTorch module needed to install Whisper has changed due to an update on the dependencies of Whisper. The module displayed in the screenshots is the previous version. Please make sure to use the version of the module you find in the text.

#!/bin/bash
#SBATCH --time=08:00:00
#SBATCH --gpus-per-node=1
#SBATCH --mem=16000

module load PyTorch/2.1.2-foss-2023a-CUDA-12.1.1
source $HOME/.envs/whisper/bin/activate
whisper $HOME/whisper_audio/* --model large-v2 --output_dir $HOME/whisper_output/

The example below uses the vi text editor found in HPC to create the script. Follow the instructions in the figures to create the script using this specific text editor:

Type or copy vi whisper_runall.sh into the terminal, then press “enter”. This will create a new and empty file that you can edit. The terminal is going to change what is displayed when vi starts.
Press the “i” key on your keyboard to enable editing of the file. Check the figure to know if the editor is in the correct mode.
Click on the terminal with the mouse wheel to paste the content of the script into the file. If you see the message in the figure displayed, click “OK” to complete the pasting. The content of the script can be found at the top of this section.
Double-check that the content of the script is correct. If it is, it should look exactly like in the picture below.
Note: The colors displayed are also important, because it means that the editor recognizes the words in the text as script commands.
Finally, to save the file and exit from vi, first press esc on your keyboard. Then type :wq directly on your keyboard. The input should be displayed at the bottom of the terminal like shown in the figure. Press enter to commit the command. The punctuation is to let the editor know that a command is coming, the w stands for “write”, while the q stands for “quit”.
If you want to make sure that the script has been saved, type ls into the terminal, then press “enter”. If the script has been saved, it shuld now appear in the list of files you have in your home directory.

Content of the batch script

The batch script you created is the starting point for all your jobs relating to Whisper. Below is a brief explanation of the different lines present in the file. Please read the next steps carefully if you wish to modify the content of the script. For convenience's sake, also make sure to always run the script through sbatch, rather than run the steps separately by hand.

Click to display

#!/bin/bash

This first line is used to tell the cluster what it should use to interpret/run the script. Do not change it!

The next three lines specify certain parameters for the batch script:

#SBATCH --time=08:00:00

This line specifies the maximum time your job will run on the cluster. The format is hh:mm:ss. The example asks for a maximum of 8 hours, which is plenty of time to cover about 15-20 hours of interviews. Should you run into longer processing times, this is the parameter you want to change.

#SBATCH --gpus-per-node=1

This line tells the cluster that the script is asking for 1 GPU to be allocated to this job. For Whisper, 1 GPU is more than enough to run the transcription, please do not modify this parameter.

#SBATCH --mem=16000

This line specifies the amount of Memory/RAM asked for this job. In the default case, the script asks for 16GB of RAM to be allocated.

The next two lines make sure that the virtual environment and the dependencies that Whisper needs to run are correctly loaded:

module load PyTorch/2.1.2-foss-2023a-CUDA-12.1.1

This line loads the program packages that Whisper needs to run. Please be sure to not modify it, otherwise the script is not going to load the correct dependencies.

source $HOME/.envs/whisper/bin/activate

This line activates the virtual environment for Whisper. As it is part of the script, you won't have to deactivate the environment once the script is launched. Once again, leave this part of the script unchanged.

Finally, the last line is the actual command to run Whisper:

whisper $HOME/whisper_audio/* --model large-v2 --output_dir $HOME/whisper_output/

If you wish to modify the location of the input audio, then you need to specify its PATH and replace $HOME/whisper_audio/*. Please remember to add an * at the end of the PATH to let the program know that you wish to process all files present in the folder you selected. In the same way, modify the PATH after --output_dir if you wish to change the location of the output directory. Finally, if you wish to change the language model used, you need to change the value after --model. Please consult the Whisper manual before changing the model.

Specialized scripts

The script described above is a general use script. It relies on Whisper to make most of the decisions regarding the transcription. If you need to be more strict on what the program is allowed to do, you might want to use one of the scripts listed below.

It is good practice to create different scripts for different tasks, instead of modifying the same script based on your needs. In this way, you don't have to modify the script again, if you want to execute a task that you already created in the past. This practice helps you keep order and is less prone to errors.

Forced English

This script forces Whisper to transcribe the audio into English. Use this script if the automatic language detection results in the wrong language (i.e. a strong English accent being recognized as Welsh, instead of English). The same concept works for other supported languages, for example Dutch. To change which language is forced, simply substitute the string English with the desired language behind the -- language command.

When you save the script, you can call it whisper_forcedEnglish.sh. If you forced a different language, we advise you to label it accordingly. To execute it, simply type into the terminal sbatch whisper_forcedEnglish.sh and follow the same steps as the general script (see here).

Click to display the script

#!/bin/bash
#SBATCH --time=08:00:00
#SBATCH --gpus-per-node=1
#SBATCH --mem=16000

module load PyTorch/2.1.2-foss-2023a-CUDA-12.1.1
source $HOME/.envs/whisper/bin/activate
whisper $HOME/whisper_audio/* --model large-v2 --language English --output_dir $HOME/whisper_output/

Translate instead of transcribe

Whisper is also capable of translating any X language into English. To let the program know that you wish to see a translation instead of a transcription, you need to specify which -- task the program needs to perform. The script below is already edited to perform a translation. Please keep in mind that the transcript will only be translated then, and the original text will not be displayed in the output files. If you need to have the original as a means of comparison, you can either first run the general script on the audio, or you can run a forced language script (see above) before you run the translation.

When you save the script, you can call it whisper_translate.sh. To execute it, simply type into the terminal sbatch whisper_translate.sh and follow the same steps as the general script (see here).

Note: Regardless of whether you run the transcription or the translation first, the file names of the output files will be the exact same. In order for the second operation (translation or transcription) to not overwrite the first, you need to rename the output files before you run the second operation. In this way, the output of your first operation will remain untouched by the second operation.