pyVideoTrans Open Source Video Translation

pyVideoTrans Video Translation Software: is an open-source software that translates videos from one language to another, including pronunciation and subtitles.

Main Uses

Video Translation: It can recognize the speech in the original video, automatically generate subtitles, translate them into target language subtitles, then dub the target language subtitles to generate audio, and finally merge the dubbed audio and target language subtitles with the original video to generate a new video, thereby realizing video translation.
Speech Recognition and Transcription: Supports batch transcription of audio or video files into SRT subtitles
SRT Subtitle Translation: Can translate SRT subtitles into other languages while maintaining the original format and timestamps
Dubbing for Subtitles or Text: Generate dubbing for SRT subtitles or text, supporting multiple dubbing channels

In addition, there are auxiliary functions such as audio and video subtitle merging, batch video and audio merging, batch video and subtitle merging, and human voice and background sound separation.

Software Working Principle

This software translates and processes by recognizing the speaking voice in the video, which is irrelevant to the original subtitles in the video. As long as there is human speaking voice in the video, it can be processed, regardless of whether the video contains subtitles.

It should be noted that:

If there are only subtitles in the video and no speaking voice, video translation or speech recognition cannot be performed.
This software cannot directly extract or recognize existing hard subtitles in the video.

Download and Install Software

The download and decompression method is only applicable to Windows systems. For Mac and Linux, please install from source code

1. Download the compressed package

Open the software official website: https://pyvideotrans.com
Click the download button to enter the download page https://pyvideotrans.com/downpackage
Select the Baidu Netdisk download address to download the complete installation package and the latest patch package.

For first use, you must download the complete installation package. After downloading the patch package, extract it and overwrite it to the directory after extracting the complete installation package.

2. Decompress the compressed package

The download and decompression method is only applicable to Windows systems. For Mac and Linux, please install from source code

The downloaded complete package and patch package are both in 7z compressed package format. You can use 7-Zip or other decompression software to decompress. (Recommended to use 360 compression software, download address: https://yasuo.360.cn)

Decompression Precautions

Avoid permission issues: Do not extract the software to folders that require administrator permissions, such as the desktop or Program Files on the C drive.
Avoid path errors: Do not include Chinese, spaces, or special symbols in the extraction path.

Highly recommended: Create a new folder named in English or numbers on a non-system drive such as D drive or E drive, and extract the software to this folder. For example: D:/videotrans.

Extraction Path Example

After extracting, find the sp.exe file and double-click it to start the software.

3. Start the Software

Double-click sp.exe to start the software. Since the software uses PySide6 to build the interface and has a lot of built-in function modules, it may take some time to start, please be patient.

Starting

After the startup is successful, the software main interface will be displayed:

Software Interface Introduction

Top left title bar: Displays the software version number.
Bottom left: Click to open the software documentation site.
Menu bar:
- Translation Configuration: Used to set some information used by the translation channel, such as the address and SK of the AI translation channel
- TTS Settings: Used to set the dubbing channel information, such as OpenAI TTS information, F5-TTS interface information, etc.
- Speech Recognition Settings: Used to set the configuration information of various speech recognition methods, such as api address, key, etc.
- Tools/Advanced Options: Set various custom advanced configurations of the software, as well as other auxiliary tools
Left side buttons:
- Custom Video Translation: Used to perform video translation operations.
- Identify Subtitles and Translate: Used to transcribe SRT subtitles from audio or video and translate the subtitles into other languages.
- Audio and Video to Subtitles: Used to batch transcribe audio or video into SRT subtitles (there must be human speech in the audio and video)
- Batch Translate SRT Subtitles: Used to translate SRT subtitle files into other languages while maintaining the format and timeline unchanged
- Batch Dubbing for Subtitles: Use text or SRT subtitles to generate dubbing, supporting multiple dubbing channels
- Audio and Video Subtitle Merging: Used to merge video files, audio files, and SRT subtitle files into the same video, suitable for scenarios where there are separate dubbing files and SRT subtitle files, and you want to embed them into the video

Video Translation Operation Steps

The software defaults to opening the Custom Video Translation module, and the operation area is on the right.

Custom Video Translation

1. Select the Original Video to be Translated

Select Video

Select the video to be processed: Click the button to select one or more video files from the computer (hold down the Ctrl key to select multiple).
Folder: Check this box to select a folder, and the software will batch translate all video files in the folder.
Clean up generated: If you operate on the same video again, the cached data generated last time will be used by default. If you need to regenerate all files, check this box.
Save to..: Click the button to select the save location of the translated file. By default, it is saved in the _video_out folder in the directory where the original video is located.
Save video only: Intermediate files such as subtitle files and audio files will be generated during the translation process. If you only need the final translated video, check this box.

2. Select Translation Channel

Translation Channel

This software will first convert the video voice into subtitles, and then translate the subtitles into the target language. The translation channel is used to complete the subtitle translation work.

Translation Channel: Select the subtitle translation channel.
- Microsoft Translate: Free, no VPN required, general translation quality. (Default option)
- Google: Better translation quality, VPN required.
- OpenAI ChatGPT: Best translation quality, VPN and paid account required, it is recommended to use chatgpt-4o or newer models, you can use other AI vendors that are compatible with OpenAI, such as DeepSeek, etc.
- Baidu Translate/Tencent Translate: Domestic translation channels, no VPN required, medium translation quality.
Pronunciation Language: Select the human speaking language of the original video.
Target Language: Select the target language to be translated.
Network Proxy: If you use a translation channel that requires a VPN (such as Google, OpenAI), fill in the proxy IP and port here.

3. Select Dubbing Channel

The translated subtitle files will be used to generate audio files using the selected dubbing channel

Dubbing Channel

Dubbing Channel: Select the dubbing engine.
- EdgeTTS: Based on the Microsoft Edge browser's voice reading function, free, no proxy required. (Default option)
- Local Channel: Requires additional installation and configuration, can be used offline locally.
- Third-party Paid API: Usually has a free trial quota.
Dubbing Role: Select the dubbing role (for example: male voice, female voice). You need to select the target language before you can select the dubbing role.
Audition Dubbing: Listen to the effect of the selected dubbing role.
Dubbing Speed/Volume/Pitch: Adjust the speed, volume, and pitch of the dubbing. The set values for speed and volume represent the percentage increase or decrease relative to the default value. For example, a speed of 15 means 15% faster than normal speed (1.15 times speed); a volume of 90 means 90% higher than normal volume (1.9 times volume).

4. Select Speech Recognition Engine

This is the most important step, recognizing the speech in the video as text and generating SRT subtitles

Speech Recognition

Speech Recognition: Select the speech recognition engine, used to convert video speech into subtitles. The default is faster-whisper, which is free and can be run locally.
Select Model: If you use faster-whisper or openai-whisper, you can select different models. The larger the model, the higher the accuracy, but the slower the running speed and the more resources consumed. The software only includes the tiny and medium models by default, and other models need to be downloaded separately. It is recommended to use the large-v2 or large-v3-turbo model, which has the best effect (requires Nvidia graphics card and CUDA/cuDNN support).
Voice Cutting Mode: Select the voice cutting mode. It is recommended to use the default Overall Recognition mode for better results. The Equal Segmentation mode will segment the voice into fragments of equal duration, only available when using faster-whisper/openai-whisper.
Chinese Re-segmentation: Check this option to use Alibaba Cloud's punctuation model to re-segment Chinese sentences, improving subtitle quality.
Voice Noise Reduction: Check this option to use Alibaba Cloud's voice noise reduction model to denoise the voice, improving recognition accuracy.

5. Set Synchronization Alignment

Synchronization Alignment

Since the speech speed and length of different languages are different, the duration of the translated dubbing may be inconsistent with the original video. This part is used to adjust the synchronization between subtitles, dubbing, and the picture.

Video Extension: If the dubbing duration exceeds the original video duration, check this option to add still frames to the end of the video to match the video duration with the dubbing duration.
Dubbing Acceleration: If the dubbing duration exceeds the original video duration, check this option to accelerate the dubbing to match its duration with the video duration. (The maximum acceleration multiple is 3 times, which can be modified in the menu Tools -> Advanced Options)
Video Slow Motion: If the dubbing duration exceeds the original video duration, check this option to reduce the video playback speed to match its duration with the dubbing duration. (The maximum slow motion multiple is 20 times, which can be modified in the menu Tools -> Advanced Options)
Subtitle Embedding: Select the subtitle embedding method.
- Do not embed subtitles: Do not embed subtitles in the video.
- Embed Hard Subtitles: Permanently embed subtitles into the video, which can be displayed in any player.
- Embed Soft Subtitles: Save the subtitles as a separate file with the video, which requires player support to display.
- Embed Hard Subtitles (Double): Embed two hard subtitles of the original language and the target language.
- Embed Soft Subtitles (Double): Embed two soft subtitles of the original language and the target language.

CJK single-line characters: Set the maximum number of characters per line of subtitles for CJK languages (default 20) when embedding hard subtitles.
Other languages: Set the maximum number of characters per line of subtitles for other languages (default 60) when embedding hard subtitles.

6. Process Background Sound

Background Sound

Keep original background sound: Check this option to keep the original background music in the translated video. Note: This option will significantly increase processing time and system resource consumption, and improve the accuracy of subtitle generation.
Add additional background audio: Click the button to select an audio file as the new background music.
Loop background sound: If the duration of the new background music is shorter than the video duration, check this option to loop the background music.
Background Volume: Adjust the volume of the background music. Values less than 1 reduce the volume, and values greater than 1 increase the volume.

7. Start Execution

Start Execution

CUDA acceleration: If you have an Nvidia graphics card and have installed CUDA/cuDNN, selecting this option can greatly increase the translation speed.

Click the Start Execution button, and the software will start translating the video.

Executing

If only one video is translated, the software will pause after generating and translating subtitles (for example, to modify typos).
If multiple videos are selected, the translation process will not pause, and the subtitles of all videos will be displayed in the subtitle area on the right, which may appear to be chaotic, but this will not affect the final translation result.

8. View Translation Results

After the translation is complete, click the progress bar to open the folder where the results are located. The translated video file is in MP4 format, and other files are intermediate material files (such as SRT subtitle files, audio files).

Audio and Video to Subtitles Function

This function can batch recognize and export the speaking voice in audio or video files as srt subtitle files.

Batch Translate SRT Subtitles Function

This function can batch translate srt subtitles into another language, and keep the output as legal SRT format subtitles.

Subtitle Output Format

Single-language Subtitles: The translation result only has subtitles in the target language

Target Language Above (Double): The translation result includes subtitles in both the original language and the target language, with the target language above and the original language below

Target Language Below (Double): The translation result includes subtitles in both the original language and the target language, with the target language below and the original language above

Batch Turn Subtitles into Speech Function

This function can synthesize dubbing files from srt subtitles, supporting batch operation

Other Functions

See the Menu Bar---Tools for other functions, which can be used as needed

Speech recognition support faster-whisper and openai-whisper local offline models and OpenAI SpeechToText API GoogleSpeech Alibaba Chinese speech recognition model and Baidu Wenxin Yiyan model, and support custom speech recognition api.
Subtitle translation support Microsoft Translate|Google Translate|Baidu Translate|Tencent Translate|ChatGPT|AzureAI|Gemini|DeepL|DeepLX|ByteDance Volcano|Offline Translation OTT|Other AI large models compatible with OpenAI and local large models
Voice synthesis Supports Microsoft Edge tts Google tts Azure AI TTS Openai TTS Elevenlabs TTS Custom TTS server api GPT-SoVITS clone-voice ChatTTS-ui Fish TTS CosyVoice F5-TTS KokoroTTS
Supported languages: Chinese Simplified and Traditional, English, Korean, Japanese, Russian, French, German, Italian, Spanish, Portuguese, Vietnamese, Thai, Arabic, Turkish, Hungarian, Hindi, Ukrainian, Kazakh, Indonesian, Malay, Czech, Polish, Dutch, Swedish, Filipino/other languages can be automatically detected

Open Source Instructions

This software is open source, open source address: https://github.com/jianchang512/pyvideotrans

Open Source Protocol GPL-V3: https://www.gnu.org/licenses/gpl-3.0.txt

Software Official Website: https://pyvideotrans.com

This software is free to download, free to use, no login required, no registration required, and the developer has not sold it on any platform or authorized anyone to sell it on any platform.

The software has a variety of built-in free and open source solutions, including online and local, which can be used for free.

At the same time, the software also supports some commercial third-party api solutions, such as ChatGPT/Tencent Translate/ByteDance Volcano. If you need to use it, please prepare your own account and key, etc., and you need to open or purchase it on the corresponding third-party platform. The cost is not related to this software. The software only provides the technical implementation of docking with third-party apis.

Software Working Principle ​

Download and Install Software ​

1. Download the compressed package ​

2. Decompress the compressed package ​

3. Start the Software ​

Software Interface Introduction ​

Video Translation Operation Steps ​

1. Select the Original Video to be Translated ​

2. Select Translation Channel ​

3. Select Dubbing Channel ​

4. Select Speech Recognition Engine ​

5. Set Synchronization Alignment ​

6. Process Background Sound ​

7. Start Execution ​

8. View Translation Results ​

Audio and Video to Subtitles Function ​

Batch Translate SRT Subtitles Function ​

Batch Turn Subtitles into Speech Function ​

Other Functions ​

Open Source Instructions ​