pyVideoTrans Open Source Video Translation

CosyVoice Open Source Address: https://github.com/FunAudioLLM/CosyVoice
CosyVoice-api Open Source Address: https://github.com/jianchang512/cosyvoice-api
Supports Chinese, English, Japanese, Korean, Cantonese. The corresponding language codes are zh|en|jp|ko|yue

Using in Video Translation Software

First, upgrade the software to 2.08+
Ensure that the CosyVoice project has been deployed, the api.py from CosyVoice-api has been placed, and api.py has been successfully started (API service must be started for use in the translation software).
Open the video translation software, go to Settings (top left) -- CosyVoice: fill in the API address, the default is http://127.0.0.1:9233
Fill in the reference audio and the text corresponding to the audio

Reference audio format:

Each line is separated into two parts by the # symbol. The first part is the path to the WAV audio, and the second part is the corresponding text content. Multiple lines can be filled in.

The optimal duration for WAV audio is 5-15 seconds. If the audio is placed in the root directory of the CosyVoice project (i.e., the same directory as webui.py), simply fill in the name here.
If it is placed in the wavs directory under the root directory, you need to fill in wavs/audio_name.wav

Example of reference audio format:

1.wav#Hello dear friends
wavs/2.wav#Hello friends

After filling in, select CosyVoice as the dubbing channel and the corresponding role in the main interface. Among them, the clone role is to copy the voice from the original video.

For other systems, please deploy CosyVoice first. The specific deployment method is as follows:

Source Code Deployment of the Official CosyVoice Project

Deployment uses conda, and this method is highly recommended, otherwise it may not be successfully installed, and there will be many problems. Some dependencies cannot be successfully installed on Windows using pip, such as pynini.

1. Download and install miniconda

miniconda is a conda management software. It is very convenient to install on Windows, just like ordinary software, just click next to complete it.

Download address: https://docs.anaconda.com/miniconda/

After the download is complete, double-click the exe file

The only thing to pay attention to is that on the following interface, you need to select the top 2 check boxes, otherwise the subsequent operations will be a bit troublesome. The second box selected means "Add the conda command to the system environment variable". If you do not select it, you will not be able to directly use the conda short command.

Then click "install" and wait for it to complete and close.

2. Download CosyVoice source code

First create an empty directory, such as creating a folder D:/py on the D drive. The following will be explained using this as an example.

Open the CosyVoice open source address: https://github.com/FunAudioLLM/CosyVoice

After downloading and decompressing, copy all the files in the CosyVoice-main directory to D:/py.

3. Create a virtual environment and activate it

Enter the D:/py folder, enter cmd in the address bar and press Enter. This will open a cmd black window.

Enter the command conda create -n cosyvoice python=3.10 in the window and press Enter, which will create a virtual environment named "cosyvoice" with a python version of "3.10".

Continue to enter the command conda activate cosyvoice and press Enter, which will activate the virtual environment. Only after activation can you continue to perform operations such as installation and startup, otherwise it will inevitably be wrong.

The sign of activation is that the command line starts with a "(cosyvoice)" character.

4. Install the pynini module

This module can only be installed using the conda command under Windows, which is why it is recommended to use conda on Windows at the beginning.

Continue to enter the command conda install -y -c conda-forge pynini==2.1.5 WeTextProcessing==1.0.3 in the cmd window that was opened and activated above and press Enter.

Note: A prompt asking for confirmation will appear during the installation. At this time, enter y and press Enter, as shown in the figure below.

5. Install other dependencies, using Alibaba Cloud mirror

Open the requirements.txt file and delete the last line WeTextProcessing==1.0.3, otherwise the installation will definitely fail, because this module depends on pynini, and pynini cannot be installed under Windows pip.
Then add 3 lines to requirements.txt: Matcha-TTS, flask, and waitress.

Continue to enter the command pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com and press Enter. After a long time, the installation will be successful without any accidents.

6. Download the api.py file and place it in the project

Go to this address https://github.com/jianchang512/cosyvoice-api/blob/main/api.py to download the api.py file, and place it together with webui.py after downloading.

Start API service

API interface address: http://127.0.0.1:9233

Enter the command and press Enter to execute python api.py

API interface list

Synthesize text according to built-in roles

Interface address: /tts
Simply synthesize text into speech, without voice cloning
Required parameters:

text: The text that needs to be synthesized into speech

role: Choose one of 'Chinese Female', 'Chinese Male', 'Japanese Male', 'Cantonese Female', 'English Female', 'English Male', 'Korean Female'

Successful return: WAV audio data
Sample code

data={
    "text":"Hello dear friends",
    "reference_audio":"10.wav"
}

response=requests.post(f'http://127.0.0.1:9933/tts',data=data,timeout=3600)

Same language clone voice synthesis

Address: /clone_eq

The pronunciation language of the reference audio is consistent with the language of the text to be synthesized. For example, the reference audio is in Chinese, and the Chinese text needs to be synthesized into speech according to the audio.

Required parameters:

text: The text that needs to be synthesized into speech

reference_audio: The reference audio that needs to clone the voice The path of the reference audio relative to api.py. For example, to reference 1.wav, this file is in the same folder as api.py, then fill in 1.wav

reference_text: The text content corresponding to the reference audio

Successful return: WAV data
Sample code

data={
    "text":"Hello dear friends.",
    "reference_audio":"10.wav",
    "reference_text":"I hope you are doing better than me."
}

response=requests.post(f'http://127.0.0.1:9933/tts',data=data,timeout=3600)

Different language voice cloning:

Address: /cone

The pronunciation language of the reference audio is different from the language of the text to be synthesized. For example, you need to synthesize a piece of English text into speech based on a reference audio with Chinese pronunciation.

Required parameters:

text: The text that needs to be synthesized into speech

Successful return: WAV data
Sample code

data={
    "text":"I received a birthday present from a close friend from afar, and my heart was filled with sweet joy from the unexpected surprise and deep blessings! .",
    "reference_audio":"10.wav"
}

response=requests.post(f'http://127.0.0.1:9933/tts',data=data,timeout=3600)

Compatible with openai tts

Interface address: /v1/audio/speech
Request method: POST
Request type: Content-Type: application/json
Request parameters input: The text to be synthesized model: Fixed tts-1, compatible with openai parameters, but not actually used speed: Speech rate, default is 1.0 reponse_format: Return format, fixed WAV audio data voice: Only used for text synthesis, take one of the following 'Chinese Female', 'Chinese Male', 'Japanese Male', 'Cantonese Female', 'English Female', 'English Male', 'Korean Female'

When used for cloning, fill in the path of the referenced reference audio relative to api.py. For example, to reference 1.wav, this file is in the same folder as api.py, then fill in 1.wav

Sample code

from openai import OpenAI

client = OpenAI(api_key='12314', base_url='http://127.0.0.1:9933/v1')
with  client.audio.speech.with_streaming_response.create(
                    model='tts-1',
                    voice='中文女',
                    input='Hello, dear friends',
                    speed=1.0
                ) as response:
    with open('./test.wav', 'wb') as f:
       for chunk in response.iter_bytes():
            f.write(chunk)

Using in Video Translation Software ​

Source Code Deployment of the Official CosyVoice Project ​

Start API service ​

API interface list ​

Synthesize text according to built-in roles ​

Same language clone voice synthesis ​

Different language voice cloning: ​

Compatible with openai tts ​

Using in Video Translation Software

Source Code Deployment of the Official CosyVoice Project

Start API service

API interface list

Synthesize text according to built-in roles

Same language clone voice synthesis

Different language voice cloning:

Compatible with openai tts