Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
dcc:itsol:whisper:scripts [2024/08/07 13:22] – moved content from LibGuides to wiki giuliodcc:itsol:whisper:scripts [2025/10/30 13:37] (current) – deleted page giulio
Line 1: Line 1:
-{{indexmenu_n>2}} 
-===== Building the scripts ===== 
  
-We will **run Whisper using a script** in order to facilitate the use of the tool. Follow the steps here to set up the script and run it, read the next section of the guide to learn more about the content of the script file itself. 
- 
-In order to run the script, you will first have to create it. Open your text editor of choice and copy the highlighted code below into the new file. Save the file with the name: ''whisper_runall.sh''. 
- 
- 
----- 
- 
----- 
- 
- 
-''#!/bin/bash'' 
- 
-''#SBATCH --time=08:00:00'' 
- 
-''#SBATCH --gpus-per-node=1'' 
- 
-''#SBATCH --mem=16000'' 
- 
-\\ 
- 
-''module load PyTorch/1.12.1-foss-2022a-CUDA-11.7.0'' 
- 
-''source $HOME/.envs/whisper/bin/activate'' 
- 
-''whisper $HOME/whisper_audio/* --model large-v2 --output_dir $HOME/whisper_output/'' 
- 
----- 
- 
----- 
- 
- 
-**You can now close the editor**. 
- 
-The example below uses the ''vi'' text editor found in HPC to create the script. Follow the instructions in the figures to create the script using this specific text editor: 
- 
-  * Type or copy ''vi whisper_runall.sh'' into the terminal, then press "enter". This will create a new and empty file that you can edit. The terminal is going to change what is displayed when ''vi'' starts. {{ :dcc:itsol:whisper:script_1.png?direct&800 | }} 
-  * Press the "i" key on your keyboard to enable editing of the file. Check the figure to know if the editor is in the correct mode. {{ :dcc:itsol:whisper:script_2.png?direct&800 | }} 
-  * Click on the terminal with the mouse wheel to paste the content of the script into the file. If you see the message in the figure displayed, click "OK" to complete the pasting. The content of the script can be found at the top of this section. {{ :dcc:itsol:whisper:script_3.png?direct&800 | }} 
-  * Double-check that the content of the script is correct. If it is, it should look exactly like in the picture below. 
-  * **Note**: The colors displayed are also important, because it means that the editor recognizes the words in the text as script commands. {{ :dcc:itsol:whisper:script_4.png?direct&800 | }} 
-  * Finally, to save the file and exit from ''vi'', first press ''esc'' on your keyboard. Then type '':wq'' directly on your keyboard. The input should be displayed at the bottom of the terminal like shown in the figure. Press enter to commit the command. The punctuation is to let the editor know that a command is coming, the ''w'' stands for "write", while the ''q'' stands for "quit". {{ :dcc:itsol:whisper:script_5.png?direct&800 | }} 
-  * If you want to make sure that the script has been saved, type ''ls'' into the terminal, then press "enter". If the script has been saved, it shuld now appear in the list of files you have in your home directory. {{ :dcc:itsol:whisper:script_6.png?direct&800 | }} 
- 
-==== Content of the batch script ==== 
- 
-The **batch script** you created **is the starting point for all your jobs relating to Whisper**. Below is a brief explanation of the different lines present in the file. Please read the next steps carefully if you wish to modify the content of the script. For convenience's sake, also make sure to always run the script through ''sbatch'', rather than run the steps separately by hand. 
- 
-++++ Click to display | 
-  * ''#!/bin/bash'' 
- 
-This first line is used to tell the cluster what it should use to interpret/run the script. **Do not change it!** 
- 
-\\ 
- 
-The next three lines specify certain parameters for the batch script: 
- 
-  * ''#SBATCH --time=08:00:00'' 
- 
-This line specifies the maximum time your job will run on the cluster. The format is ''hh:mm:ss''. The example asks for a maximum of 8 hours, which is plenty of time to cover about 15-20 hours of interviews. Should you run into longer processing times, this is the parameter you want to change. 
- 
-  * ''#SBATCH --gpus-per-node=1'' 
- 
-This line tells the cluster that the script is asking for 1 GPU to be allocated to this job. For Whisper, 1 GPU is more than enough to run the transcription, please do not modify this parameter. 
- 
-  * ''#SBATCH --mem=16000'' 
- 
-This line specifies the amount of Memory/RAM asked for this job. In the default case, the script asks for 16GB of RAM to be allocated. 
- 
-\\ 
- 
-The next two lines make sure that the virtual environment and the dependencies that Whisper needs to run are correctly loaded: 
- 
-  * ''module load PyTorch/1.12.1-foss-2022a-CUDA-11.7.0'' 
- 
-This line loads the program packages that Whisper needs to run. Please be sure to not modify it, otherwise the script is not going to load the correct dependencies. 
- 
-  * ''source $HOME/.envs/whisper/bin/activate'' 
- 
-This line activates the virtual environment for Whisper. As it is part of the script, you won't have to deactivate the environment once the script is launched. Once again, leave this part of the script unchanged. 
- 
-Finally, the last line is the actual command to run Whisper: 
- 
-  * ''whisper $HOME/whisper_audio/* --model large-v2 --output_dir $HOME/whisper_output/'' 
- 
-If you wish to modify the location of the input audio, then you need to specify its ''PATH'' and replace ''$HOME/whisper_audio/*''. Please remember to add an ''*'' at the end of the PATH to let the program know that you wish to process all files present in the folder you selected. In the same way, modify the PATH after ''--output_dir'' if you wish to change the location of the output directory. Finally, if you wish to change the language model used, you need to change the value after ''--model''. Please consult the Whisper manual before changing the model. 
- 
-++++ 
- 
-[[dcc:itsol:whisper:running| → Move to the next step]]