Skip to content

Improving the Quality of AI Translated Subtitles

When using AI to translate SRT subtitles, there are generally two methods.

Method 1: Translate the entire subtitle format, including the "line number" and "timestamp line" that do not need to be translated.

As shown in the following example, the entire format is sent:

 1
 00:00:01,950 --> 00:00:04,950
 Organic molecules have been discovered in the Five Old Star System.

 2
 00:00:04,950 --> 00:00:07,902
 We are still multiple levels away from a third kind of contact.

 3
 00:00:07,902 --> 00:00:11,958
 The microwave really started filming the task and has been here for a year.

Advantages: Considers the context, and the translation quality is better.

Disadvantages: In addition to wasting tokens, it may also cause the subtitle format to be disordered during translation, and the returned translation result is no longer a valid SRT subtitle format. For example, English symbols , : may be incorrectly changed to Chinese symbols, or the line number and time row may be combined into one line, etc.

Method 2: Only send the subtitle text content, and then replace the corresponding text in the original subtitles with the translation result.

In the following format, only subtitle text is sent:

 Organic molecules have been discovered in the Five Old Star System.
 We are still multiple levels away from a third kind of contact.
 The microwave really started filming the task and has been here for a year.

Advantages: Can ensure that the translation result is definitely a valid SRT subtitle format.

Disadvantages: Also very obvious, the subtitle text is translated line by line, and the context cannot be taken into account, greatly reducing the translation quality.

In order to solve this problem, the software supports translating multiple lines at a time, with a default of 15 lines of subtitles, which can respond to the context to a certain extent.

But then a new problem arises: the grammar rules and sentence structure order of different languages are different, and it is likely that the original text is 15 lines, and after translation, it becomes 14 lines, 13 lines, etc., especially when the previous line and the next line are the same sentence in terms of grammatical structure.

The 15 lines of original subtitles are no longer 15 lines after translation, which will definitely lead to subtitle confusion. In order to solve this problem, when the number of translated lines is inconsistent with the number of original subtitles lines, the translation is re-translated line by line to ensure that the number of lines before and after the subtitles is completely consistent, abandoning the corresponding context.

The second method is used by default in the software. After all, being usable is more important than being good.


From version v2.52, a new first translation method support has been added. It is not enabled by default. If you want to enable it, you need to manually enable it. After it is enabled, when using ChatGPT/Gemini/AzureGPT/302.AI/Byte Volcano/LocalLLM for translation, the complete SRT subtitles with format will be sent for translation, which can better respond to the context and improve the translation quality.

But you must pay attention, the problems mentioned in the first method may occur, causing the result to be an illegal SRT subtitle, which may cause parsing errors or lose all content after the error. It is recommended to use this method only on sufficiently intelligent models, such as GPT-4o-mini or larger models. If it is a locally deployed model, it is not recommended to use this method. Limited by hardware resources, the locally deployed model is generally very small and not intelligent enough, and it is more likely that the translation result format will be disordered.

Enable the first translation method:

Menu -- Tools/Options -- Advanced Options -- Subtitle Translation Area -- Send Complete Subtitles During AI Intelligent Translation

Add a Glossary

You can add your own glossary to each prompt, similar to the following

**During the translation process, be sure to use** the glossary I provide to translate the terms, and maintain the consistency of the terms. The specific glossary is as follows:

   * Transformer -> Transformer
   * Token -> Token
   * LLM/Large Language Model -> 大语言模型
   * Generative AI -> 生成式 AI
   * One Health -> One Health
   * Radiomics -> 影像组学
   * OHHLEP -> OHHLEP
   * STEM -> STEM
   * SHAPE -> SHAPE
   * Single-cell transcriptomics -> 单细胞转录组学
   * Spatial transcriptomics -> 空间转录组学