How to Install and Use ChatTTS

ChatTTS is a text-to-speech model designed specifically for dialogue scenarios such as LLM assistant. Let's get started with ChatTTS in just a few simple steps.

What is ChatTTS?

ChatTTS is a voice generation model designed for conversational scenarios, specifically for the dialogue tasks of large language model (LLM) assistants, as well as applications such as conversational audio and video introductions. It supports both Chinese and English, and through the use of approximately 100,000 hours of Chinese and English data for training, ChatTTS demonstrates high quality and naturalness in speech synthesis.

ChatTTS Features

Multi-language Support
One of the key features of ChatTTS is its support for multiple languages, including English and Chinese. This allows it to serve a wide range of users and overcome language barriers

Large Data Training
ChatTTS has been trained using a significant amount of data, approximately 10 million hours of Chinese and English data. This extensive training has resulted in high-quality and natural-sounding voice synthesis

Dialog Task Compatibility
ChatTTS is well-suited for handling dialog tasks typically assigned to large language models LLMs. It can generate responses for conversations and provide a more natural and fluid interaction experience when integrated into various applications and services

Open Source Plans
the project team plans to open source a trained base model. This will enable academic researchers and developers in the community to further study and develop the technology

Control and Security
The team is committed to improving the controllability of the model, adding watermarks, and integrating it with LLMs. These efforts ensure the safety and reliability of the model

Ease of Use
ChatTTS provides an easy-to-use experience for its users. It requires only text information as input, which generates corresponding voice files. This simplicity makes it convenient for users who have voice synthesis needs

System Requirements

Windows 10+,Ubuntu 20.04+

Git, Python 3.9+

Audio libraries FFmpeg or SoundFile

Nvidia GPU with 4GB+ VRAM,CUDA 11.x or 12.x

How to use ChatTTS?

Below are the steps to install and use ChatTTS. Note that the exact process may vary depending on the specific ChatTTS application or library you are using.

Step 1 - Download from GitHub

Download ChatTTS code from GitHub repo: https://github.com/2noise/chattts.

git clone https://github.com/2noise/ChatTTS
Step 2 - Install Dependencies

Before you begin, make sure you have the necessary packages installed. You will need torch and ChatTTS. If you haven't installed them yet, you can do so using pip:

pip install torch soundfile ChatTTS
Step 3 - Import Required Libraries

Import the necessary libraries for your script. You'll need ChatTTS and soundfile

import soundfile
import ChatTTS
Step 4 - Initialize ChatTTS

Create an instance of the ChatTTS class and load the pre-trained models.

chat = ChatTTS.Chat()
chat.load()
Step 5 - Prepare Your Text

Define the text you want to convert to speech. Replace with your desired text.

texts = ["Hello, welcome to ChatTTS!",]
Step 6 - Generate Speech

Use the infer method to generate speech from the text. Set use_decoder=True to enable the decoder.

wavs = chat.infer(texts, use_decoder=True)
Step 7 - Save the Audio

Use the soudfile to save the generated audio. Set the sample rate to 24,000 Hz.

soundfile.write("output1.wav", wavs[0][0], 24000)
Step 8 - Complete Script

Here's the complete script for reference:

import soundfile
import ChatTTS

# Initialize ChatTTS
chat = ChatTTS.Chat()
chat.load()

# Define the text to be converted to speech
texts = ["Hello, welcome to ChatTTS!",]

# Generate speech
wavs = chat.infer(texts, use_decoder=True)

# save the generated audio
soundfile.write("output1.wav", wavs[0][0], 24000)
ChatTTS demo

Frequently Asked Questions

Have a question? Check out some of the common queries below.

What can ChatTTS be used for?

ChatTTS can be used for various applications, including but not limited to: Conversational tasks for large language model assistants Generating dialogue speech Video introductions Educational and training content speech synthesis Any application or service requiring text-to-speech functionality

How much VRAM do I need? How about infer speed?

For a 30-second audio clip, at least 4GB of GPU memory is required. For the 4090 GPU, it can generate audio corresponding to approximately 7 semantic tokens per second. The Real-Time Factor (RTF) is around 0.3.

Are there any limitations to using ChatTTS?

While ChatTTS is a powerful and versatile text-to-speech model, there are some limitations to consider. For instance, the quality of synthesized speech may vary depending on the complexity and length of the input text. Additionally, the model's performance can be influenced by the computational resources available, as generating high-quality speech in real-time may require significant processing power. Continuous updates and improvements are being made to address these limitations and enhance the model's capabilities.

Troubleshooting - No GPU found, use CPU instead

Please make sure that the machine you are using has an NVIDIA GPU card installed and the driver is correctly installed, and the nvidia-smi command output is normal.

Then, you need to install the gpu version of torch, first execute
pip uninstall -y torch

If your cuda is 11.x, execute
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118

If it is 12.x, execute
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121

Troubleshooting - RuntimeError: Couldn't find appropriate backend to handle uri output1.wav and format wav.

If you use torchaudio, you need to install ffmpeg software. Download ffmpeg and add Path var on Windows, and execute on Linux
apt update
apt install ffmpeg -y
# Sample code:
torchaudio.save("output1.wav", torch.from_numpy(wavs[0]), 24000, format='wav')

It is recommended to use the soundfile package
pip install soundfile
# Sample code:
soundfile.write("output1.wav", wavs[0][0], 24000)
Additional - Some Good GPU Plans for ChatTTS

Express GPU - P1000

64.00/mo
1mo3mo12mo24mo
Order Now
  • 32GB RAM
  • Eight-Core Xeon E5-2690
  • 120GB + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro P1000
  • Microarchitecture: Pascal
  • Max GPUs: 1
  • CUDA Cores: 640
  • GPU Memory: 4GB GDDR5
  • FP32 Performance: 1.894 TFLOPS

Basic GPU - T1000

99.00/mo
1mo3mo12mo24mo
Order Now
  • 64GB RAM
  • Eight-Core Xeon E5-2690
  • 120GB + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro T1000
  • Microarchitecture: Turing
  • Max GPUs: 1
  • CUDA Cores: 896
  • GPU Memory: 8GB GDDR6
  • FP32 Performance: 2.5 TFLOPS
Summer Sale

Professional GPU VPS - A4000

90.3/mo
Save 50% (Was $179.00)
1mo3mo12mo24mo
Order Now
  • 32GB RAM
  • 24 CPU Cores
  • 320GB SSD
  • 300Mbps Unmetered Bandwidth
  • Once per 2 Weeks Backup
  • OS: Linux / Windows 10
  • Dedicated GPU: Quadro RTX A4000
  • CUDA Cores: 6,144
  • Tensor Cores: 192
  • GPU Memory: 16GB GDDR6
  • FP32 Performance: 19.2 TFLOPS
  • Available for Rendering, AI/Deep Learning, Data Science, CAD/CGI/DCC.

Basic GPU - RTX 4060

149.00/mo
1mo3mo12mo24mo
Order Now
  • 64GB RAM
  • Eight-Core E5-2690
  • 120GB SSD + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia GeForece RTX 4060
  • Microarchitecture: Ada Lovelace
  • Max GPUs: 2
  • CUDA Cores: 3072
  • Tensor Cores: 96
  • GPU Memory: 8GB GDDR6
  • FP32 Performance: 15.11 TFLOPS

Advanced GPU - RTX 3060 Ti

179.00/mo
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • Dual 12-Core E5-2697v2
  • 240GB SSD + 2TB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: GeForce RTX 3060 Ti
  • Microarchitecture: Ampere
  • Max GPUs: 2
  • CUDA Cores: 4864
  • Tensor Cores: 152
  • GPU Memory: 8GB GDDR6
  • FP32 Performance: 16.2 TFLOPS

Advanced GPU - A4000

209.00/mo
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • Dual 12-Core E5-2697v2
  • 240GB SSD + 2TB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro RTX A4000
  • Microarchitecture: Ampere
  • Max GPUs: 2
  • CUDA Cores: 6144
  • Tensor Cores: 192
  • GPU Memory: 16GB GDDR6
  • FP32 Performance: 19.2 TFLOPS
  • Good choice for hosting AI image generator, BIM, 3D rendering, CAD, deep learning, etc.

Advanced GPU - A5000

269.00/mo
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • Dual 12-Core E5-2697v2
  • 240GB SSD + 2TB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro RTX A5000
  • Microarchitecture: Ampere
  • Max GPUs: 2
  • CUDA Cores: 8192
  • Tensor Cores: 256
  • GPU Memory: 24GB GDDR6
  • FP32 Performance: 27.8 TFLOPS
  • Good alternative to RTX 3090 Ti, A10.
Daily Price: $13/day

Enterprise GPU - RTX 4090

286.3/mo
48% Off Recurring (Was $549.00)
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: GeForce RTX 4090
  • Microarchitecture: Ada Lovelace
  • Max GPUs: 1
  • CUDA Cores: 16,384
  • Tensor Cores: 512
  • GPU Memory: 24 GB GDDR6X
  • FP32 Performance: 82.6 TFLOPS
  • Perfect for 3D rendering/modeling , CAD/ professional design, video editing, gaming, HPC, AI/deep learning.

    Request to charge by day.
Let us get back to you

If you can't find a suitable GPU Plan, or have a need to customize a GPU server, or have ideas for cooperation, please leave me a message. We will reach you back within 36 hours.

Email *
Name
Company
Message *
I agree to be contacted as per Database Mart privacy policy.