OpenAI Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.
Whisper is a series of pre-trained models for automatic speech recognition (ASR), which was released in September 2022 by Alec Radford and others from OpenAI. Whisper is pre-trained on large amounts of annotated audio transcription data. The annotated audio duration used for training is as high as 680,000 hours, so it shows comparable performance to the most advanced ASR systems.
There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model; actual speed may vary depending on many factors including the available hardware.
In December 2022, OpenAI released an improved large model named large-v2, and large-v3 in November 2023.
Python 3.8–3.11
Windows 10, 11
Git, Conda, Pytorch
GPU support requires a CUDA®-enabled card, 4GB+ VRAM
This guide uses the Advanced GPU - V100 Plan on GPUMart, which is equipped with a dedicated NVIDIA V100 graphics card with 16GB HBM2 GPU Memory and can easily run the latest large-v3 multi-language model. Since Whisper has many dependencies to run, the process of installing whisper is a bit long but simple. It mainly consists of the following 5 steps.
Click here (https://git-scm.com/download/win) to download the latest 64-bit version of Git for Windows, then right click on the downloaded file and run the installer as administrator.
Miniconda is a minimal installer provided by Anaconda. Please download the latest Miniconda installer (https://docs.anaconda.com/free/miniconda/) and complete the installation.
Whisper requires Python3.8+. You'll need Python 3.8-3.11 and recent versions of PyTorch on your machine. Let's set up a virtual environment with conda if you want to isolate these experiments from other work.
> conda create -n Whisper python=3.10.11 > conda activate Whisper
Whisper requires a recent version of PyTorch (we used PyTorch 1.12.1 without issues).
> conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
Open a PowerShell terminal and from the PS C:\> prompt, run the following command:
> Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))
If you don't see any errors, you are ready to use Chocolatey! Whisper also requires FFmpeg, an audio-processing library. If FFmpeg is not already installed on your machine, use one of the below commands to install it.
> choco install ffmpeg
Pull and install the latest commit from this repository, along with its Python dependencies:
> pip install git+https://github.com/openai/whisper.git
The following command will transcribe speech in audio files, using the medium model:
> whisper audio.wav --model medium
The default setting (which selects the small model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the --language option:
> whisper chinese.mp3 --language Chinese
Adding --task translate will translate the speech into English:
> whisper chinese.mp3 --language Chinese --task translate
Specify the output format and path:
> whisper Arthur.mp3 --model large-v3 --output_format txt --output_dir .\output
To learn more about usage, please see the help:
> whisper -h
Transcription can also be performed within Python:
import whisper model = whisper.load_model("base") result = model.transcribe("audio.mp3") print(result["text"])
If you have not installed JupyterLab, please install it first, and then start it. The reference command line is as follows.
(Whisper) PS > conda install -c conda-forge jupyterlab (Whisper) PS > jupyter lab
In this tutorial, we cover the basics of getting started with Whisper AI on Windows. Whisper AI provides a powerful and intuitive speech recognition solution for Windows users. By following the steps outlined in this guide, you can easily install and utilize Whisper AI on your Windows operating system. Experience the convenience and efficiency of speech recognition technology as you embrace a hands-free approach to various tasks.
Express GPU Dedicated Server - P1000
Basic GPU Dedicated Server - GTX 1650
Basic GPU Dedicated Server - GTX 1660
Professional GPU Dedicated Server - RTX 2060
Basic GPU Dedicated Server - RTX 4060
Advanced GPU Dedicated Server - RTX 3060 Ti
Advanced GPU Dedicated Server - A4000
Advanced GPU Dedicated Server - V100
If you do not find a suitable GPU server plan, please leave us a message.