| Both sides previous revision Previous revision Next revision | Previous revision |
| dcc:itsol:whisper:setup [2025/10/20 13:16] – Updated explanation on what folders to create giulio | dcc:itsol:whisper:setup [2025/10/30 13:39] (current) – ran spell- and grammar check giulio |
|---|
| {{ :dcc:itsol:whisper:hb_portal_1.png?direct&900 | }} | {{ :dcc:itsol:whisper:hb_portal_1.png?direct&900 | }} |
| |
| Once you have an HPC account, you can navigate to the ''Files'' tab in the top menu bar and select ''/scratch/p-number''. Before we can run the script for the transcription, we need to make sure that folders are set up correctly in your HPC environment. | Once you have an HPC account, you can navigate to the ''Files'' tab in the top menu bar and select ''/scratch/p-number''. Before we can run the script for the transcription, we need to make sure that the folders are set up correctly in your HPC environment. |
| |
| {{ :dcc:itsol:whisper:hb_portal_2a.png?direct&900 | }} | {{ :dcc:itsol:whisper:hb_portal_2a.png?direct&900 | }} |
| {{ :dcc:itsol:whisper:hb_portal_3.png?direct&900 | }} | {{ :dcc:itsol:whisper:hb_portal_3.png?direct&900 | }} |
| |
| Once inside the Whisper main folder, you need to create two subfolders. Once again, use ''New Directory'' to create each new folder. Call them "input" and "output", respectively, taking care to use lower case letters. Do not worry if your do not see any ''.sh'' file or ''slurm'' file in your file view. They will come later. | Once inside the Whisper main folder, you need to create two subfolders. Once again, use ''New Directory'' to create each new folder. Call them "input" and "output", respectively, taking care to use lowercase letters. Do not worry if you do not see any ''.sh'' file or ''slurm'' file in your file view. They will come later. |
| |
| {{ :dcc:itsol:whisper:hb_portal_4.png?direct&900 | }} | {{ :dcc:itsol:whisper:hb_portal_4.png?direct&900 | }} |
| |
| ==== Building the virtual environment and installing Whisper ==== | ==== Upload your data before launching the job ==== |
| |
| **Note: This step is only needed the first time you set up Whisper.** After you have installed the program for the first time, you can skip directly to the next part of the guide to run the program. | Now that you have the folder structure ready, you can upload the audio file(s) you wish to transcribe. Click on the "input" folder to open its window view, then click ''Upload'' and follow the instructions to transfer your audio to the HPC environment. Please note that the example here contains a single file, but that Whisper can transcribe multiple files in the same job. Feel free to upload as many audio files as needed. The only limitation you have is that the maximum runtime of Whisper, as it is set up now, covers about 20 hours of interviews. If you need to transcribe more than this amount of time, please consider splitting the data into two separate batches and launching two separate jobs. |
| |
| When logged into your session in the terminal, you will have a prompt where you can enter commands. In order to run Whisper, you will need to create the proper environment in your HPC session. To do so, copy the grey-highlighted lines below one by one into your terminal and run them separately by pressing enter. | {{ :dcc:itsol:whisper:hb_portal_4a.png?direct&900 | }} |
| |
| **Note**: To copy text into the terminal, ''ctrl+V'' will not work. Use either the right mouse click, then select paste from the drop-down menu, or, if you have a mouse wheel, click on the terminal with the mouse wheel to paste the text directly after you copied it. | [[dcc:itsol:whisper:running| → Move to the next step]] |
| | |
| **Steps to follow to install whisper**: | |
| | |
| * First, you need to load a module that Whisper will need to run. To do so, copy-paste the line highlighted in grey below into the terminal, as shown in the figure. | |
| * <code> module load PyTorch/2.1.2-foss-2023a-CUDA-12.1.1</code> {{ :dcc:itsol:whisper:insta_4a.png?direct&600 | }} | |
| * Then you need to create the virtual environment where you will install whisper. Copy-paste the line below into the terminal. | |
| * <code> python3 -m venv $HOME/.envs/whisper </code> {{ :dcc:itsol:whisper:insta_5a.png?direct&600 | }} | |
| * Now, activate the newly created environment by copy-pasting the line below. | |
| * <code> source $HOME/.envs/whisper/bin/activate </code> {{ :dcc:itsol:whisper:insta_6a.png?direct&600 | }} | |
| * Before you install Whisper, you need to make sure to have the latest version of some programs. Copy the two lines below separately into the terminal, as shown in the figures: | |
| * <code> pip install --upgrade pip </code> {{ :dcc:itsol:whisper:insta_7a.png?direct&600 | }} | |
| * <code> pip install --upgrade wheel </code> {{ :dcc:itsol:whisper:insta_8a.png?direct&600 | }} | |
| * Finally, you can install Whisper by running the command below: | |
| * <code> pip install git+https://github.com/openai/whisper.git </code> {{ :dcc:itsol:whisper:insta_9a.png?direct&600 | }} | |
| * If everything went well, you should see the following screen: {{ :dcc:itsol:whisper:insta_10a.png?direct&600 | }} | |
| * As a final step, type <code> deactivate </code> into the terminal, then press "enter". After this initial installation, you won't need to manually activate the whisper environment anymore. {{ :dcc:itsol:whisper:insta_11a.png?direct&600 | }} | |
| * If you wish to fully close the environment and also close the HPC session directly, type <code> exit </code> instead of <code> deactivate </code>. | |
| | |
| | |
| **Note**: The version numbers displayed in this guide for the programs you have installed and upgraded reflect the most recent versions at the time this guide was written. The numbers you will see displayed might have changed if newer versions have been released. | |
| | |
| | |
| [[dcc:itsol:whisper:scripts| → Move to the next step]] | |