pyVideoTrans Open Source Video Translation

Custom Speech Recognition API

Starting from version v3.56, the speech recognition service of Gladia is supported in this custom speech recognition channel. Please check this tutorial for specific usage methods.

If you are not satisfied with the existing speech recognition methods, you can also customize your own speech recognition API. Fill in the relevant information in Menu - Speech Recognition Settings - Custom Speech Recognition API.

Fill in your API address, starting with http, and send the wav format audio data with a sampling rate of 16k and 1 channel with the key name audio to the API address you filled in. If your API has key verification, fill in the relevant password in the key box. This password will be attached to the API address and sent with sk=password.

requests.post(api_url, files={"audio": open(audio_file, 'rb')})

Your API needs to return data in JSON format. Set the code to 1 and the msg to the reason for recognition failure.

Return on failure:
    res={
        "code":1,
        "msg":"Reason for error"
    }

Return on success:

res={
            "code":0,
            "data":[
                {
                    "text":"Subtitle text",
                    "time":'00:00:01,000 --> 00:00:06,500'
                },
                {
                    "text":"Subtitle text",
                    "time":'00:00:06,900 --> 00:00:12,200'
                },
                ...multiple
            ]
        }

as follows

    If the key password value is filled in, it will be attached to the api_url and sent, api_url?sk=filled in sk value
        
        requests.post(api_url, files={"audio": open(audio_file, 'rb')})
        
        Return on failure:
        res={
            "code":1,
            "msg":"Reason for error"
        }
        
        Return on success:
        res={
            "code":0,
            "data":[
                {
                    "text":"Subtitle text",
                    "time":'00:00:01,000 --> 00:00:06,500'
                },
                {
                    "text":"Subtitle text",
                    "time":'00:00:06,900 --> 00:00:12,200'
                },
            ]
        }

Custom Speech Recognition API ​

Custom Speech Recognition API