How to Install and Use ChatTTS



What is ChatTTS?

ChatTTS is a voice generation model designed for conversational scenarios, specifically for the dialogue tasks of large language model (LLM) assistants, as well as applications such as conversational audio and video introductions. It supports both Chinese and English, and through the use of approximately 100,000 hours of Chinese and English data for training, ChatTTS demonstrates high quality and naturalness in speech synthesis.

ChatTTS Features

Multi-language Support
One of the key features of ChatTTS is its support for multiple languages, including English and Chinese. This allows it to serve a wide range of users and overcome language barriers

Large Data Training
ChatTTS has been trained using a significant amount of data, approximately 10 million hours of Chinese and English data. This extensive training has resulted in high-quality and natural-sounding voice synthesis

Dialog Task Compatibility
ChatTTS is well-suited for handling dialog tasks typically assigned to large language models LLMs. It can generate responses for conversations and provide a more natural and fluid interaction experience when integrated into various applications and services

Open Source Plans
the project team plans to open source a trained base model. This will enable academic researchers and developers in the community to further study and develop the technology

Control and Security
The team is committed to improving the controllability of the model, adding watermarks, and integrating it with LLMs. These efforts ensure the safety and reliability of the model

Ease of Use
ChatTTS provides an easy-to-use experience for its users. It requires only text information as input, which generates corresponding voice files. This simplicity makes it convenient for users who have voice synthesis needs

System Requirements

Windows 10+，Ubuntu 20.04+

Git, Python 3.9+

Audio libraries FFmpeg or SoundFile

Nvidia GPU with 4GB+ VRAM，CUDA 11.x or 12.x

How to use ChatTTS?

Below are the steps to install and use ChatTTS. Note that the exact process may vary depending on the specific ChatTTS application or library you are using.

Step 1 - Download from GitHub

Download ChatTTS code from GitHub repo: https://github.com/2noise/chattts.

git clone https://github.com/2noise/ChatTTS

Step 2 - Install Dependencies

Before you begin, make sure you have the necessary packages installed. You will need torch and ChatTTS. If you haven't installed them yet, you can do so using pip:

pip install torch soundfile ChatTTS

Step 3 - Import Required Libraries

Import the necessary libraries for your script. You'll need ChatTTS and soundfile

import soundfile
import ChatTTS

Step 4 - Initialize ChatTTS

Create an instance of the ChatTTS class and load the pre-trained models.

chat = ChatTTS.Chat()
chat.load()

Step 5 - Prepare Your Text

Define the text you want to convert to speech. Replace with your desired text.

texts = ["Hello, welcome to ChatTTS!",]

Step 6 - Generate Speech

Use the infer method to generate speech from the text. Set use_decoder=True to enable the decoder.

wavs = chat.infer(texts, use_decoder=True)

Step 7 - Save the Audio

Use the soudfile to save the generated audio. Set the sample rate to 24,000 Hz.

soundfile.write("output1.wav", wavs[0][0], 24000)

Step 8 - Complete Script

Here's the complete script for reference:

import soundfile
import ChatTTS

# Initialize ChatTTS
chat = ChatTTS.Chat()
chat.load()

# Define the text to be converted to speech
texts = ["Hello, welcome to ChatTTS!",]

# Generate speech
wavs = chat.infer(texts, use_decoder=True)

# save the generated audio
soundfile.write("output1.wav", wavs[0][0], 24000)

Frequently Asked Questions

Have a question? Check out some of the common queries below.

What can ChatTTS be used for?



ChatTTS can be used for various applications, including but not limited to: Conversational tasks for large language model assistants Generating dialogue speech Video introductions Educational and training content speech synthesis Any application or service requiring text-to-speech functionality

How much VRAM do I need? How about infer speed?



For a 30-second audio clip, at least 4GB of GPU memory is required. For the 4090 GPU, it can generate audio corresponding to approximately 7 semantic tokens per second. The Real-Time Factor (RTF) is around 0.3.

Are there any limitations to using ChatTTS?



While ChatTTS is a powerful and versatile text-to-speech model, there are some limitations to consider. For instance, the quality of synthesized speech may vary depending on the complexity and length of the input text. Additionally, the model's performance can be influenced by the computational resources available, as generating high-quality speech in real-time may require significant processing power. Continuous updates and improvements are being made to address these limitations and enhance the model's capabilities.

Troubleshooting - No GPU found, use CPU instead



Please make sure that the machine you are using has an NVIDIA GPU card installed and the driver is correctly installed, and the nvidia-smi command output is normal.

Then, you need to install the gpu version of torch, first execute
pip uninstall -y torch

If your cuda is 11.x, execute
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118

If it is 12.x, execute
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121

Troubleshooting - RuntimeError: Couldn't find appropriate backend to handle uri output1.wav and format wav.



If you use torchaudio, you need to install ffmpeg software. Download ffmpeg and add Path var on Windows, and execute on Linux
apt update
apt install ffmpeg -y
# Sample code:
torchaudio.save("output1.wav", torch.from_numpy(wavs[0]), 24000, format='wav')

It is recommended to use the soundfile package
pip install soundfile
# Sample code:
soundfile.write("output1.wav", wavs[0][0], 24000)

Additional - Some Good GPU Plans for ChatTTS

Express GPU Dedicated Server - P1000

$ 64.00/mo

1mo3mo12mo24mo

Order Now

32GB RAM
GPU: Nvidia Quadro P1000
Eight-Core Xeon E5-2690
120GB + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Pascal
CUDA Cores: 640
GPU Memory: 4GB GDDR5
FP32 Performance: 1.894 TFLOPS

Mid Year Sale

Basic GPU Dedicated Server - T1000

$ 59.50/mo

50% OFF Recurring (Was $119.00)

1mo3mo12mo24mo

Order Now

64GB RAM
GPU: Nvidia Quadro T1000
Eight-Core Xeon E5-2690
120GB + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Turing
CUDA Cores: 896
GPU Memory: 8GB GDDR6
FP32 Performance: 2.5 TFLOPS

Professional GPU VPS - A4000

$ 129.00/mo

1mo3mo12mo24mo

Order Now

32GB RAM
24 CPU Cores
320GB SSD
300Mbps Unmetered Bandwidth

Once per 2 Weeks Backup
OS: Linux / Windows 10/ Windows 11
Dedicated GPU: Quadro RTX A4000
CUDA Cores: 6,144
Tensor Cores: 192
GPU Memory: 16GB GDDR6
FP32 Performance: 19.2 TFLOPS

Basic GPU Dedicated Server - RTX 4060

$ 149.00/mo

1mo3mo12mo24mo

Order Now

64GB RAM
GPU: Nvidia GeForce RTX 4060
Eight-Core E5-2690
120GB SSD + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ada Lovelace
CUDA Cores: 3072
Tensor Cores: 96
GPU Memory: 8GB GDDR6
FP32 Performance: 15.11 TFLOPS

Mid Year Sale

Advanced GPU Dedicated Server - RTX 3060 Ti

$ 172.08/mo

28% OFF Recurring (Was $239.00)

1mo3mo12mo24mo

Order Now

128GB RAM
GPU: GeForce RTX 3060 Ti
Dual 12-Core E5-2697v2
240GB SSD + 2TB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 4864
Tensor Cores: 152
GPU Memory: 8GB GDDR6
FP32 Performance: 16.2 TFLOPS

Mid Year Sale

Advanced GPU Dedicated Server - A4000

$ 198.09/mo

29% OFF Recurring (Was $279.00)

1mo3mo12mo24mo

Order Now

128GB RAM
GPU: Nvidia Quadro RTX A4000
Dual 12-Core E5-2697v2
240GB SSD + 2TB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 6144
Tensor Cores: 192
GPU Memory: 16GB GDDR6
FP32 Performance: 19.2 TFLOPS

Advanced GPU Dedicated Server - A5000

$ 269.00/mo

1mo3mo12mo24mo

Order Now

128GB RAM
GPU: Nvidia Quadro RTX A5000
Dual 12-Core E5-2697v2
240GB SSD + 2TB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 8192
Tensor Cores: 256
GPU Memory: 24GB GDDR6
FP32 Performance: 27.8 TFLOPS

Enterprise GPU Dedicated Server - RTX 4090

$ 409.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: GeForce RTX 4090
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ada Lovelace
CUDA Cores: 16,384
Tensor Cores: 512
GPU Memory: 24 GB GDDR6X
FP32 Performance: 82.6 TFLOPS

Let us get back to you

If you can't find a suitable GPU Plan, or have a need to customize a GPU server, or have ideas for cooperation, please leave me a message. We will reach you back within 36 hours.

Email *

Name

Company

Message *

I agree to be contacted as per Database Mart privacy policy.