Skip to content

GPT-SoVITS is an excellent open-source multilingual text-to-speech (TTS) project that supports multiple languages ​​such as Chinese, English, Japanese, and Korean. Its main functions include:

Zero-shot text-to-speech (TTS): Only a 5-second voice sample is required to quickly generate speech.

Few-shot TTS: Only 1 minute of training data is required to fine-tune the model, thereby improving timbre similarity and naturalness.

Cross-language support: Supports synthesis in languages ​​different from the training dataset. Currently supports English, Japanese, Korean, Cantonese, and Chinese.

GPT-SoVITS has now been upgraded to version v2, adding the following features:

  1. Added support for Korean and Cantonese
  2. Optimize text front-end processing
  3. Expand the amount of training data for the underlying model to 5,000 hours
  4. Can generate higher quality synthesized audio for low-quality reference audio (such as network audio with high-frequency loss and muffled sound quality)

GPT-SOVITS User Manual https://www.yuque.com/baicaigongchang1145haoyuangong/ib3g1e

The video translation software has integrated the GPT-SoVITS v2 version. This article will briefly introduce how to download the GPT-SoVITS integrated package and use it in the video translation software.

Download Integrated Package

It is recommended to download the official integrated package of GPT-SoVITS to ensure compatibility. Third-party API interfaces are not compatible with the official ones, which may cause the video translation software to report errors.

Download address: https://www.yuque.com/baicaigongchang1145haoyuangong/ib3g1e/dkxgpiy9zb96hob4

image.png

Start API Service

Enter cmd in the address bar in the GPT-SoVITS folder and press Enter. Enter .\runtime\python api_v2.py in the pop-up terminal window to start the API service.

image.png

The default port is 9880. In the video translation software, you need to fill in http://127.0.0.1:9880.

The API service must be started to be used in the translation software.

Configure in Video Translation and Dubbing Software

1. Fill in API Address

Start the software, click Menu -> TTS Settings -> GPT-SoVITS in turn, and fill in http://127.0.0.1:9880 in the API text box.

image.png

Note: The default port is 9880. If you change the port, the API address also needs to be changed accordingly. In addition, please ensure that when deploying locally, the address should be filled in as 127.0.0.1 instead of 0.0.0.0.

2. Fill in Reference Audio

The reference audio refers to the audio that GPT-SoVITS will use the timbre of to perform speech synthesis. Suppose you have an audio file 1.wav (5 seconds long, content is "The weather is good today, pouring rain is pouring down"), you can copy this file to the GPT-SoVITS folder and place it in the same location as the api_v2.py file, and fill in the corresponding content in the Reference Audio Text Box of the software.

image.png

Language code: zh means Chinese, en means English, ja means Japanese, ko means Korean.

If you store all the reference audio files uniformly in the wavs folder in the GPT-SoVITS directory, the reference audio path should be wavs/1.wav#The weather is good today, pouring rain is pouring down#zh.

image.png

3. Check api_v2?

If the api_v2.py file is started, please make sure to check the api_v2? option. image.png

4. Test Connection

Click Test, and if there is no error, the configuration is successful.

Common Problems

  1. Prompts 404 error during testing

    This is caused by using a third-party integrated package. The API of the third-party package is not compatible with the official one. Please download and use the official package.

  2. Prompts "Remote computer actively refused" or "Please check whether the api service is started"

    The API service may not be started, or it may be blocked by the firewall. Please ensure that the API is started, or close