How to Use the Original Video's Voice for Dubbing
In dubbing operations, we usually choose a fixed voice, such as "yunxi," "xiaoyi," or "解说小帅" (Explanation Xiaoshuai), and use only that voice throughout the dubbing. However, for scenarios with multiple speakers, using a single voice may not be ideal. A better effect would be to have each speaker correspond to a specific voice, preferably consistent with the voice of the speaker in the original video. For example, if Bajie (Pigsy) is speaking in the original video, it would be ideal to maintain Bajie's voice even after translating to English. This is where the original voice cloning feature comes in.
Currently, the software supports three dubbing channels for original voice cloning: clone-voice, CosyVoice, and F5-TTS.
Principle: When dubbing a specific segment (e.g., 00:00:03 --> 00:00:08), the original audio of that segment will be cut out first. The original text content corresponding to the audio and the translated target text will be obtained. Then, this data will be sent to the dubbing channel, which will generate the dubbing of the target text based on the original audio's voice.
Using the clone-voice Dubbing Channel
You need to install the https://github.com/jianchang512/clone-voice project. After opening the project's homepage, carefully read the instructions. You can deploy the clone-voice project using the source code. If you are using Windows, you can also find Releases on the right side of the page (https://github.com/jianchang512/clone-voice/releases), download the integrated package directly, and double-click app.exe
to start it after downloading and extracting.
When the program shows that it has started successfully, fill in the default API address http://127.0.0.1:9988
in the video translation software under Menu--TTS Settings--Original Voice Cloning clone-voice in the HTTP address field. After testing that there are no problems, you can start using it.
Using the CosyVoice Dubbing Channel
Similarly, you need to install the CosyVoice project. See https://pyvideotrans.com/cosyvoice.html for the installation tutorial.
Of course, you can also use third-party integrated packages, but third-party integrated packages do not support cloning voices; they only allow specifying fixed audio.
After installing according to the tutorial, download the api.py
file from this address: https://github.com/jianchang512/cosyvoice-api/blob/main/api.py and place it in the CosyVoice project, in the same directory as the webui.py file.
Then, start api.py and fill in the API address in the video translation software under Menu--TTS Settings-CosyVoice in the API address field. The default address is http://127.0.0.1:9233
.
Using the F5-TTS Dubbing Channel
You need to install the F5-TTS project. See https://pyvideotrans.com/f5tts.html for a detailed installation tutorial.
You can install from the source code, or use the integrated package installation under Windows. After installation, double-click run-api.bat to start the API service, and then fill in the default address http://127.0.0.1:5010
in the video translation software under Menu-TTS Settings-F5-TTS API address.
In the main interface, select the role as clone to perform cloned voice dubbing
Note that, except for clone-voice, which supports more than ten languages, F5-TTS and CosyVoice only support Chinese and English voice cloning.