How to Install and Use ChatTTS

ChatTTS is a text-to-speech model designed specifically for dialogue scenarios such as LLM assistant. Let's get started with ChatTTS in just a few simple steps.

What is ChatTTS?

ChatTTS is a voice generation model designed for conversational scenarios, specifically for the dialogue tasks of large language model (LLM) assistants, as well as applications such as conversational audio and video introductions. It supports both Chinese and English, and through the use of approximately 100,000 hours of Chinese and English data for training, ChatTTS demonstrates high quality and naturalness in speech synthesis.

ChatTTS Features

Multi-language Support
One of the key features of ChatTTS is its support for multiple languages, including English and Chinese. This allows it to serve a wide range of users and overcome language barriers

Large Data Training
ChatTTS has been trained using a significant amount of data, approximately 10 million hours of Chinese and English data. This extensive training has resulted in high-quality and natural-sounding voice synthesis

Dialog Task Compatibility
ChatTTS is well-suited for handling dialog tasks typically assigned to large language models LLMs. It can generate responses for conversations and provide a more natural and fluid interaction experience when integrated into various applications and services

Open Source Plans
the project team plans to open source a trained base model. This will enable academic researchers and developers in the community to further study and develop the technology

Control and Security
The team is committed to improving the controllability of the model, adding watermarks, and integrating it with LLMs. These efforts ensure the safety and reliability of the model

Ease of Use
ChatTTS provides an easy-to-use experience for its users. It requires only text information as input, which generates corresponding voice files. This simplicity makes it convenient for users who have voice synthesis needs

System Requirements

Windows 10+,Ubuntu 20.04+

Git, Python 3.9+

Audio libraries FFmpeg or SoundFile

Nvidia GPU with 4GB+ VRAM,CUDA 11.x or 12.x

How to use ChatTTS?

Below are the steps to install and use ChatTTS. Note that the exact process may vary depending on the specific ChatTTS application or library you are using.

Step 1 - Download from GitHub

Download ChatTTS code from GitHub repo: https://github.com/2noise/chattts.

git clone https://github.com/2noise/ChatTTS
Step 2 - Install Dependencies

Before you begin, make sure you have the necessary packages installed. You will need torch and ChatTTS. If you haven't installed them yet, you can do so using pip:

pip install torch soundfile ChatTTS
Step 3 - Import Required Libraries

Import the necessary libraries for your script. You'll need ChatTTS and soundfile

import soundfile
import ChatTTS
Step 4 - Initialize ChatTTS

Create an instance of the ChatTTS class and load the pre-trained models.

chat = ChatTTS.Chat()
chat.load()
Step 5 - Prepare Your Text

Define the text you want to convert to speech. Replace with your desired text.

texts = ["Hello, welcome to ChatTTS!",]
Step 6 - Generate Speech

Use the infer method to generate speech from the text. Set use_decoder=True to enable the decoder.

wavs = chat.infer(texts, use_decoder=True)
Step 7 - Save the Audio

Use the soudfile to save the generated audio. Set the sample rate to 24,000 Hz.

soundfile.write("output1.wav", wavs[0][0], 24000)
Step 8 - Complete Script

Here's the complete script for reference:

import soundfile
import ChatTTS

# Initialize ChatTTS
chat = ChatTTS.Chat()
chat.load()

# Define the text to be converted to speech
texts = ["Hello, welcome to ChatTTS!",]

# Generate speech
wavs = chat.infer(texts, use_decoder=True)

# save the generated audio
soundfile.write("output1.wav", wavs[0][0], 24000)
ChatTTS demo

Frequently Asked Questions

Have a question? Check out some of the common queries below.

What can ChatTTS be used for?

ChatTTS can be used for various applications, including but not limited to: Conversational tasks for large language model assistants Generating dialogue speech Video introductions Educational and training content speech synthesis Any application or service requiring text-to-speech functionality

How much VRAM do I need? How about infer speed?

For a 30-second audio clip, at least 4GB of GPU memory is required. For the 4090 GPU, it can generate audio corresponding to approximately 7 semantic tokens per second. The Real-Time Factor (RTF) is around 0.3.

Are there any limitations to using ChatTTS?

While ChatTTS is a powerful and versatile text-to-speech model, there are some limitations to consider. For instance, the quality of synthesized speech may vary depending on the complexity and length of the input text. Additionally, the model's performance can be influenced by the computational resources available, as generating high-quality speech in real-time may require significant processing power. Continuous updates and improvements are being made to address these limitations and enhance the model's capabilities.

Troubleshooting - No GPU found, use CPU instead

Please make sure that the machine you are using has an NVIDIA GPU card installed and the driver is correctly installed, and the nvidia-smi command output is normal.

Then, you need to install the gpu version of torch, first execute
pip uninstall -y torch

If your cuda is 11.x, execute
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118

If it is 12.x, execute
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121

Troubleshooting - RuntimeError: Couldn't find appropriate backend to handle uri output1.wav and format wav.

If you use torchaudio, you need to install ffmpeg software. Download ffmpeg and add Path var on Windows, and execute on Linux
apt update
apt install ffmpeg -y
# Sample code:
torchaudio.save("output1.wav", torch.from_numpy(wavs[0]), 24000, format='wav')

It is recommended to use the soundfile package
pip install soundfile
# Sample code:
soundfile.write("output1.wav", wavs[0][0], 24000)
Additional - Some Good GPU Plans for ChatTTS
Christmas Sale

Express GPU Dedicated Server - P1000

40.00/mo
45% OFF Recurring (Was $74.00)
1mo3mo12mo24mo
Order Now
  • 32GB RAM
  • Eight-Core Xeon E5-2690
  • 120GB + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro P1000
  • Microarchitecture: Pascal
  • CUDA Cores: 640
  • GPU Memory: 4GB GDDR5
  • FP32 Performance: 1.894 TFLOPS
Christmas Sale

Basic GPU Dedicated Server - T1000

79.00/mo
34% OFF Recurring (Was $119.00)
1mo3mo12mo24mo
Order Now
  • 64GB RAM
  • Eight-Core Xeon E5-2690
  • 120GB + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro T1000
  • Microarchitecture: Turing
  • CUDA Cores: 896
  • GPU Memory: 8GB GDDR6
  • FP32 Performance: 2.5 TFLOPS

Professional GPU VPS - A4000

129.00/mo
1mo3mo12mo24mo
Order Now
  • 32GB RAM
  • 24 CPU Cores
  • 320GB SSD
  • 300Mbps Unmetered Bandwidth
  • Once per 2 Weeks Backup
  • OS: Linux / Windows 10
  • Dedicated GPU: Quadro RTX A4000
  • CUDA Cores: 6,144
  • Tensor Cores: 192
  • GPU Memory: 16GB GDDR6
  • FP32 Performance: 19.2 TFLOPS
  • Available for Rendering, AI/Deep Learning, Data Science, CAD/CGI/DCC.

Basic GPU Dedicated Server - RTX 4060

149.00/mo
1mo3mo12mo24mo
Order Now
  • 64GB RAM
  • Eight-Core E5-2690
  • 120GB SSD + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia GeForce RTX 4060
  • Microarchitecture: Ada Lovelace
  • CUDA Cores: 3072
  • Tensor Cores: 96
  • GPU Memory: 8GB GDDR6
  • FP32 Performance: 15.11 TFLOPS
  • Ideal for video edting, rendering, android emulators, gaming and light AI tasks.

Advanced GPU Dedicated Server - RTX 3060 Ti

179.00/mo
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • Dual 12-Core E5-2697v2
  • 240GB SSD + 2TB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: GeForce RTX 3060 Ti
  • Microarchitecture: Ampere
  • CUDA Cores: 4864
  • Tensor Cores: 152
  • GPU Memory: 8GB GDDR6
  • FP32 Performance: 16.2 TFLOPS

Advanced GPU Dedicated Server - A4000

209.00/mo
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • Dual 12-Core E5-2697v2
  • 240GB SSD + 2TB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro RTX A4000
  • Microarchitecture: Ampere
  • CUDA Cores: 6144
  • Tensor Cores: 192
  • GPU Memory: 16GB GDDR6
  • FP32 Performance: 19.2 TFLOPS
  • Good choice for hosting AI image generator, BIM, 3D rendering, CAD, deep learning, etc.
Christmas Sale

Advanced GPU Dedicated Server - A5000

244.00/mo
30% OFF Recurring (Was $349.00)
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • Dual 12-Core E5-2697v2
  • 240GB SSD + 2TB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro RTX A5000
  • Microarchitecture: Ampere
  • CUDA Cores: 8192
  • Tensor Cores: 256
  • GPU Memory: 24GB GDDR6
  • FP32 Performance: 27.8 TFLOPS

Enterprise GPU Dedicated Server - RTX 4090

409.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: GeForce RTX 4090
  • Microarchitecture: Ada Lovelace
  • CUDA Cores: 16,384
  • Tensor Cores: 512
  • GPU Memory: 24 GB GDDR6X
  • FP32 Performance: 82.6 TFLOPS
  • Perfect for 3D rendering/modeling , CAD/ professional design, video editing, gaming, HPC, AI/deep learning.
Let us get back to you

If you can't find a suitable GPU Plan, or have a need to customize a GPU server, or have ideas for cooperation, please leave me a message. We will reach you back within 36 hours.

Email *
Name
Company
Message *
I agree to be contacted as per Database Mart privacy policy.