Creating the Ideal Translated Video: A Step-by-Step Guide
The ideal translated video should possess the following characteristics: accurate and appropriately-lengthed subtitles, a dubbing tone consistent with the original voice, and perfect synchronization between the subtitles, sound, and visuals.
This guide provides a detailed overview of the four steps in video translation, with optimal configuration recommendations for each step.
Step 1: Speech Recognition
Objective: Convert the speech in the video into a subtitle file in the corresponding language.
Corresponding Control Element: "Speech Recognition" row
Optimal Configuration:
- Select
faster-whisper(local)
- Choose model
large-v2
,large-v3
, orlarge-v3-turbo
- Select
Overall Recognition
for voice cutting mode - Select
Voice Denoising
(time-consuming) - Select
Preserve Original Background Sound
(time-consuming) - If the video is in Chinese, also select
Chinese Repunctuation
- Select
Note: Processing will be extremely slow without an NVIDIA graphics card, or if CUDA acceleration is not enabled due to an unconfigured CUDA environment. Insufficient VRAM may cause a crash.
Step 2: Subtitle Translation
Objective: Translate the subtitle file generated in Step 1 into the target language.
Corresponding Control Element: "Translation Channel" row
Optimal Configuration:
Preferred Choice: If you have a VPN and know how to configure it, use the
gemini-1.5-flash
model in Menu -> Translation Settings -> Gemini Pro (Gemini AI Channel).Secondary Choice: If you don't have a VPN or don't know how to configure a proxy, select
OpenAI ChatGPT
in "Translation Channel" and use thechagpt-4o
series models in Menu -> Translation Settings -> OpenAI ChatGPT (requires a third-party relay).Alternative Solution: If you can't find a suitable third-party relay, you can choose to use domestic AIs such as Moonshot AI or Deepseek.
In Menu -> Tools/Options -> Advanced Options, select the two items shown in the following figure:
GeminiAI Usage: https://pyvideotrans.com/gemini.html
Step 3: Dubbing
Objective: Generate dubbing based on the translated subtitle file.
Corresponding Control Element: "Dubbing Channel" row
Optimal Configuration:
- Chinese or English:
F5-TTS (local)
, selectclone
for dubbing role - Japanese or Korean:
CosyVoice (local)
, selectclone
for dubbing role - Other Languages:
clone-voice (local)
, selectclone
for dubbing role - All three channels can retain the original video's emotional tone to the greatest extent, with
F5-TTS
providing the best effect.
You need to additionally install the corresponding
F5-TTS/CosyVoice/clone-voice
integration package, see the documentation: https://pyvideotrans.com/f5tts.html- Chinese or English:
Step 4: Synchronize Subtitles, Dubbing, and Visuals
Objective: Synchronize the subtitles, dubbing, and visuals.
Corresponding Control Element:
Synchronization Alignment
rowOptimal Configuration:
When translating from Chinese to English, you can set the
Dubbing Speech Speed
value (e.g.,10
or15
) to speed up the dubbing, as English sentences are often longer.Select the
Extend Video
,Dubbing Acceleration
, andVideo Slowdown
options to force alignment between subtitles, sound, and visuals.In Menu -> Tools/Options -> Advanced Options -> Subtitle Sound Picture Alignment Area, configure the following settings:
The
Audio Maximum Acceleration Multiple
andVideo Slowdown Multiple
can be adjusted according to the actual situation (the default value is 3).
The selection of each specific option and its value setting is recommended to be fine-tuned according to the speech speed and other factors in the actual video.
Output Video Quality Control
The default output is lossy compression. For lossless output, in Menu -> Tools -> Advanced Options -> Video Output Control Area, set
Video Transcoding Loss Control
to 0:Note: If the original video is not in mp4 format or uses embedded hard subtitles, video encoding conversion will cause some loss, but the loss is usually negligible. Improving video quality will significantly reduce processing speed and increase the output video size.