How to Install and Use Whisper AI on Windows

Independence Sale! Up to 59% OFF – Among the Best Prices This Year!



Blog

Partner

About Us

Hot GPU Discounts

Introducing Whisper

What's Whisper?

OpenAI Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

Whisper is a series of pre-trained models for automatic speech recognition (ASR), which was released in September 2022 by Alec Radford and others from OpenAI. Whisper is pre-trained on large amounts of annotated audio transcription data. The annotated audio duration used for training is as high as 680,000 hours, so it shows comparable performance to the most advanced ASR systems.

Available models and languages

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model; actual speed may vary depending on many factors including the available hardware.

In December 2022, OpenAI released an improved large model named large-v2, and large-v3 in November 2023.

System Requirements

Python 3.8–3.11

Windows 10, 11

Git, Conda, Pytorch

GPU support requires a CUDA®-enabled card, 4GB+ VRAM

5 Steps to install Whisper AI

This guide uses the Advanced GPU - V100 Plan on GPUMart, which is equipped with a dedicated NVIDIA V100 graphics card with 16GB HBM2 GPU Memory and can easily run the latest large-v3 multi-language model. Since Whisper has many dependencies to run, the process of installing whisper is a bit long but simple. It mainly consists of the following 5 steps.

Step 1 - Install Git

Click here (https://git-scm.com/download/win) to download the latest 64-bit version of Git for Windows, then right click on the downloaded file and run the installer as administrator.

Step 2 - Install Miniconda3 and create Python 3.10 Environment

Miniconda is a minimal installer provided by Anaconda. Please download the latest Miniconda installer (https://docs.anaconda.com/free/miniconda/) and complete the installation.

Whisper requires Python3.8+. You'll need Python 3.8-3.11 and recent versions of PyTorch on your machine. Let's set up a virtual environment with conda if you want to isolate these experiments from other work.

> conda create -n Whisper python=3.10.11
> conda activate Whisper

Step 3 - Install PyTorch Stable(2.3.0) with CUDA 12.1 support

Whisper requires a recent version of PyTorch (we used PyTorch 1.12.1 without issues).

> conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

Step 4 - Install Chocolatey and ffmpeg

Open a PowerShell terminal and from the PS C:\> prompt, run the following command:

> Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

If you don't see any errors, you are ready to use Chocolatey! Whisper also requires FFmpeg, an audio-processing library. If FFmpeg is not already installed on your machine, use one of the below commands to install it.

> choco install ffmpeg

Step 5 - Install Whisper

Pull and install the latest commit from this repository, along with its Python dependencies:

> pip install git+https://github.com/openai/whisper.git

How to Use Whisper for Speech-to-text Transcription

Command-line usage

The following command will transcribe speech in audio files, using the medium model:

> whisper audio.wav --model medium

The default setting (which selects the small model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the --language option:

> whisper chinese.mp3 --language Chinese

Adding --task translate will translate the speech into English:

> whisper chinese.mp3 --language Chinese --task translate

Specify the output format and path:

> whisper Arthur.mp3 --model large-v3 --output_format txt --output_dir .\output

To learn more about usage, please see the help:

> whisper -h

Python usage

Transcription can also be performed within Python:

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

JupyterLab usage

If you have not installed JupyterLab, please install it first, and then start it. The reference command line is as follows.

(Whisper) PS > conda install -c conda-forge jupyterlab
(Whisper) PS > jupyter lab

Conclusion

In this tutorial, we cover the basics of getting started with Whisper AI on Windows. Whisper AI provides a powerful and intuitive speech recognition solution for Windows users. By following the steps outlined in this guide, you can easily install and utilize Whisper AI on your Windows operating system. Experience the convenience and efficiency of speech recognition technology as you embrace a hands-free approach to various tasks.

Additional - GPU Servers Suitable for Running Whisper AI

Please choose the appropriate GPU server based on the maximum model size you need to use. The medium model requires 5G of VRAM, and the large model requires 10GB of VRAM.

Express GPU Dedicated Server - P1000

$ 64.00/mo

1mo3mo12mo24mo

Order Now

32GB RAM
GPU: Nvidia Quadro P1000
Eight-Core Xeon E5-2690
120GB + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Pascal
CUDA Cores: 640
GPU Memory: 4GB GDDR5
FP32 Performance: 1.894 TFLOPS

Independence-6 Months Savings

Basic GPU Dedicated Server - GTX 1650

$ 59.50/mo

50% OFF Recurring (Was $119.00)

1mo3mo12mo24mo

Order Now

64GB RAM
GPU: Nvidia GeForce GTX 1650
Eight-Core Xeon E5-2667v3
120GB + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Turing
CUDA Cores: 896
GPU Memory: 4GB GDDR5
FP32 Performance: 3.0 TFLOPS

Independence-6 Months Savings

Basic GPU Dedicated Server - GTX 1660

$ 79.50/mo

50% OFF Recurring (Was $159.00)

1mo3mo12mo24mo

Order Now

64GB RAM
GPU: Nvidia GeForce GTX 1660
Dual 10-Core Xeon E5-2660v2
120GB + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Turing
CUDA Cores: 1408
GPU Memory: 6GB GDDR6
FP32 Performance: 5.0 TFLOPS

Professional GPU Dedicated Server - RTX 2060

$ 199.00/mo

1mo3mo12mo24mo

Order Now

128GB RAM
GPU: Nvidia GeForce RTX 2060
Dual 10-Core E5-2660v2
120GB + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 1920
Tensor Cores: 240
GPU Memory: 6GB GDDR6
FP32 Performance: 6.5 TFLOPS

Independence-6 Months Savings

Basic GPU Dedicated Server - RTX 4060

$ 89.50/mo

50% OFF Recurring (Was $179.00)

1mo3mo12mo24mo

Order Now

64GB RAM
GPU: Nvidia GeForce RTX 4060
Eight-Core E5-2690
120GB SSD + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ada Lovelace
CUDA Cores: 3072
Tensor Cores: 96
GPU Memory: 8GB GDDR6
FP32 Performance: 15.11 TFLOPS

Advanced GPU Dedicated Server - RTX 3060 Ti

$ 239.00/mo

1mo3mo12mo24mo

Order Now

128GB RAM
GPU: GeForce RTX 3060 Ti
Dual 12-Core E5-2697v2
240GB SSD + 2TB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 4864
Tensor Cores: 152
GPU Memory: 8GB GDDR6
FP32 Performance: 16.2 TFLOPS

Advanced GPU Dedicated Server - A4000

$ 279.00/mo

1mo3mo12mo24mo

Order Now

128GB RAM
GPU: Nvidia Quadro RTX A4000
Dual 12-Core E5-2697v2
240GB SSD + 2TB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 6144
Tensor Cores: 192
GPU Memory: 16GB GDDR6
FP32 Performance: 19.2 TFLOPS

Advanced GPU Dedicated Server - V100

$ 229.00/mo

1mo3mo12mo24mo

Order Now

128GB RAM
GPU: Nvidia V100
Dual 12-Core E5-2690v3
240GB SSD + 2TB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Volta
CUDA Cores: 5,120
Tensor Cores: 640
GPU Memory: 16GB HBM2
FP32 Performance: 14 TFLOPS

If you do not find a suitable GPU server plan, please leave us a message.

Email *

Name

Company

Any Questions/Suggestions *

I agree to be contacted as per Database Mart privacy policy.

How to Install and Use Whisper AI on Windows for Speech Recognition

Introducing Whisper

What's Whisper?

Available models and languages

System Requirements

5 Steps to install Whisper AI

Step 1 - Install Git

Step 2 - Install Miniconda3 and create Python 3.10 Environment

Step 3 - Install PyTorch Stable(2.3.0) with CUDA 12.1 support

Step 4 - Install Chocolatey and ffmpeg

Step 5 - Install Whisper

How to Use Whisper for Speech-to-text Transcription

Command-line usage

Python usage

JupyterLab usage

Conclusion

Additional - GPU Servers Suitable for Running Whisper AI