pyVideoTrans Open Source Video Translation

Creating the Ideal Translated Video: A Step-by-Step Guide

The ideal translated video should possess the following characteristics: accurate and appropriately-lengthed subtitles, a dubbing tone consistent with the original voice, and perfect synchronization between the subtitles, sound, and visuals.

This guide provides a detailed overview of the four steps in video translation, with optimal configuration recommendations for each step.

Step 1: Speech Recognition

Objective: Convert the speech in the video into a subtitle file in the corresponding language.
Corresponding Control Element: "Speech Recognition" row
Optimal Configuration:
- Select faster-whisper(local)
- Choose model large-v2, large-v3, or large-v3-turbo
- Select Overall Recognition for voice cutting mode
- Select Voice Denoising (time-consuming)
- Select Preserve Original Background Sound (time-consuming)
- If the video is in Chinese, also select Chinese Repunctuation
Note: Processing will be extremely slow without an NVIDIA graphics card, or if CUDA acceleration is not enabled due to an unconfigured CUDA environment. Insufficient VRAM may cause a crash.

Step 2: Subtitle Translation

Objective: Translate the subtitle file generated in Step 1 into the target language.
Corresponding Control Element: "Translation Channel" row
Optimal Configuration:
- Preferred Choice: If you have a VPN and know how to configure it, use the gemini-1.5-flash model in Menu -> Translation Settings -> Gemini Pro (Gemini AI Channel).
- Secondary Choice: If you don't have a VPN or don't know how to configure a proxy, select OpenAI ChatGPT in "Translation Channel" and use the chagpt-4o series models in Menu -> Translation Settings -> OpenAI ChatGPT (requires a third-party relay).
- Alternative Solution: If you can't find a suitable third-party relay, you can choose to use domestic AIs such as Moonshot AI or Deepseek.
- In Menu -> Tools/Options -> Advanced Options, select the two items shown in the following figure:

GeminiAI Usage: https://pyvideotrans.com/gemini.html

Step 3: Dubbing

Objective: Generate dubbing based on the translated subtitle file.
Corresponding Control Element: "Dubbing Channel" row
Optimal Configuration:
- Chinese or English: F5-TTS (local), select clone for dubbing role
- Japanese or Korean: CosyVoice (local), select clone for dubbing role
- Other Languages: clone-voice (local), select clone for dubbing role
- All three channels can retain the original video's emotional tone to the greatest extent, with F5-TTS providing the best effect.
You need to additionally install the corresponding F5-TTS/CosyVoice/clone-voice integration package, see the documentation: https://pyvideotrans.com/f5tts.html

Step 4: Synchronize Subtitles, Dubbing, and Visuals

Objective: Synchronize the subtitles, dubbing, and visuals.
Corresponding Control Element: Synchronization Alignment row
Optimal Configuration:
- When translating from Chinese to English, you can set the Dubbing Speech Speed value (e.g., 10 or 15) to speed up the dubbing, as English sentences are often longer.
- Select the Extend Video, Dubbing Acceleration, and Video Slowdown options to force alignment between subtitles, sound, and visuals.
- In Menu -> Tools/Options -> Advanced Options -> Subtitle Sound Picture Alignment Area, configure the following settings:
- The Audio Maximum Acceleration Multiple and Video Slowdown Multiple can be adjusted according to the actual situation (the default value is 3).
The selection of each specific option and its value setting is recommended to be fine-tuned according to the speech speed and other factors in the actual video.

Output Video Quality Control

The default output is lossy compression. For lossless output, in Menu -> Tools -> Advanced Options -> Video Output Control Area, set Video Transcoding Loss Control to 0:
Note: If the original video is not in mp4 format or uses embedded hard subtitles, video encoding conversion will cause some loss, but the loss is usually negligible. Improving video quality will significantly reduce processing speed and increase the output video size.

Creating the Ideal Translated Video: A Step-by-Step Guide ​

Step 1: Speech Recognition ​

Step 2: Subtitle Translation ​

Step 3: Dubbing ​

Step 4: Synchronize Subtitles, Dubbing, and Visuals ​

Output Video Quality Control ​