ChatTTS is a voice generation model designed for conversational scenarios, specifically for the dialogue tasks of large language model (LLM) assistants, as well as applications such as conversational audio and video introductions. It supports both Chinese and English, and through the use of approximately 100,000 hours of Chinese and English data for training, ChatTTS demonstrates high quality and naturalness in speech synthesis.
Multi-language Support
One of the key features of ChatTTS is its support for multiple languages, including English and Chinese. This allows it to serve a wide range of users and overcome language barriers
Large Data Training
ChatTTS has been trained using a significant amount of data, approximately 10 million hours of Chinese and English data. This extensive training has resulted in high-quality and natural-sounding voice synthesis
Dialog Task Compatibility
ChatTTS is well-suited for handling dialog tasks typically assigned to large language models LLMs. It can generate responses for conversations and provide a more natural and fluid interaction experience when integrated into various applications and services
Open Source Plans
the project team plans to open source a trained base model. This will enable academic researchers and developers in the community to further study and develop the technology
Control and Security
The team is committed to improving the controllability of the model, adding watermarks, and integrating it with LLMs. These efforts ensure the safety and reliability of the model
Ease of Use
ChatTTS provides an easy-to-use experience for its users. It requires only text information as input, which generates corresponding voice files. This simplicity makes it convenient for users who have voice synthesis needs
Windows 10+,Ubuntu 20.04+
Git, Python 3.9+
Audio libraries FFmpeg or SoundFile
Nvidia GPU with 4GB+ VRAM,CUDA 11.x or 12.x
Below are the steps to install and use ChatTTS. Note that the exact process may vary depending on the specific ChatTTS application or library you are using.
Download ChatTTS code from GitHub repo: https://github.com/2noise/chattts.
git clone https://github.com/2noise/ChatTTS
Before you begin, make sure you have the necessary packages installed. You will need torch and ChatTTS. If you haven't installed them yet, you can do so using pip:
pip install torch soundfile ChatTTS
Import the necessary libraries for your script. You'll need ChatTTS and soundfile
import soundfile import ChatTTS
Create an instance of the ChatTTS class and load the pre-trained models.
chat = ChatTTS.Chat() chat.load()
Define the text you want to convert to speech. Replace
texts = ["Hello, welcome to ChatTTS!",]
Use the infer method to generate speech from the text. Set use_decoder=True to enable the decoder.
wavs = chat.infer(texts, use_decoder=True)
Use the soudfile to save the generated audio. Set the sample rate to 24,000 Hz.
soundfile.write("output1.wav", wavs[0][0], 24000)
Here's the complete script for reference:
import soundfile import ChatTTS # Initialize ChatTTS chat = ChatTTS.Chat() chat.load() # Define the text to be converted to speech texts = ["Hello, welcome to ChatTTS!",] # Generate speech wavs = chat.infer(texts, use_decoder=True) # save the generated audio soundfile.write("output1.wav", wavs[0][0], 24000)
Have a question? Check out some of the common queries below.
Express GPU Dedicated Server - P1000
Basic GPU Dedicated Server - T1000
Professional GPU VPS - A4000
Basic GPU Dedicated Server - RTX 4060
Advanced GPU Dedicated Server - RTX 3060 Ti
Advanced GPU Dedicated Server - A4000
Advanced GPU Dedicated Server - A5000
Enterprise GPU Dedicated Server - RTX 4090
If you can't find a suitable GPU Plan, or have a need to customize a GPU server, or have ideas for cooperation, please leave me a message. We will reach you back within 36 hours.