Why do sound, subtitles, and visuals become out of sync?
After translating between different languages, sentence lengths can change, and pronunciation durations generally also change. For example, when translating from Chinese to English, the length of the sentences will definitely be different, and the duration of speaking the Chinese sentence compared to the English translation will generally be different as well.
Chinese: 有多远滚多远
(Gǔn duō yuǎn gǔn duō yuǎn) - "Get out of here and never come back" English: Get out of here as far as you can!
Chinese: 滚远点
(Gǔn yuǎn diǎn) - "Get lost" Japanese: ここから出て行け。
(Koko kara deteike) - "Get out of here."
If the original video's Chinese pronunciation takes 2 seconds, translating to English and dubbing may take 4 seconds, inevitably causing desynchronization.
How to Synchronize Them, Ignoring Quality, Just Synchronization
As mentioned above, if the duration before translation is 2 seconds and after translation is 4 seconds, if you only need them to be synchronized, without concern for speech speed or video speed, you can directly speed up the audio by 2x. This shortens the 4-second duration to 2 seconds, achieving synchronization. Alternatively, you can slow down the video, extending the original 2-second segment to 4 seconds, also achieving alignment.
Audio Acceleration for Alignment - Specific Steps:
In the software interface, select "Automatic Audio Acceleration" and deselect "Automatic Video Slowdown."
Open the menu: Tools -> Options. Set "Audio Maximum Acceleration Multiple" to
100
.
This achieves synchronization, but the downside is obvious: the speech speed fluctuates drastically.
Video Slowdown for Alignment - Specific Steps:
In the software interface, deselect "Automatic Audio Acceleration" and select "Automatic Video Slowdown."
Open the menu: Tools -> Options. Set "Video Maximum Slowdown Multiple" to 20.
This also achieves alignment, maintaining a consistent speech speed but slowing down the video, resulting in a video that fluctuates between fast and slow.
If your goal is simply synchronization, and you're not concerned about the quality, you can use these methods.
A Better, More Acceptable Synchronization Method
The above methods are clearly impractical. Audio that is too fast, or video that is too slow, is unacceptable and provides a poor experience. For better results, enable both "Automatic Audio Acceleration" and "Automatic Video Slowdown."
Specific Steps:
When selecting faster mode or openai mode, try to use a medium or larger model and select "Overall Recognition."
In the software interface, select both "Automatic Audio Acceleration" and "Automatic Video Slowdown." Set a smaller overall acceleration value, such as 10%.
Open the menu: Tools -> Options. Set "Audio Maximum Acceleration Multiple" to 1.8 (meaning the speech can accelerate to a maximum of 1.8 times its normal speed). You can manually adjust this to a value greater than 1, such as 2 or 1.5.
Open the menu: Tools -> Options. Set "Video Maximum Slowdown Multiple" to 2 (meaning the video can slow down to 0.05 times its normal speed). You can change this to a value greater than 1, such as 3 or 5.
Even after steps 1-3, synchronization may not be achieved because the maximum values are limited. If the maximum is reached and synchronization is not achieved, it will give up and the subtitles will be delayed. In this case, continue adjusting the subtitle-related options in the menu: Tools -> Options.
Is There a Perfect Synchronization Method?
Aside from manual adjustments involving human intervention, such as simplifying translations or adding transition scenes, no perfect, fully automated method has been found.
Simultaneously ensuring that, across long or short videos and translations into any language, a program can automatically achieve the following goals - "audio acceleration within an acceptable range," "video slowdown within an acceptable range," and "mouth movements synchronized with the start of speech" - seems like an impossible task. Aside from manual adjustments, there is no perfect method.