Everyone knows that Microsoft Edge browser has a powerful read-aloud feature that supports dozens of languages, each with different characters to choose from for pronunciation, with excellent results.
Based on this, a developer created a Python package called edge-tts
. This package allows the use of Microsoft's TTS service in programs for dubbing text or subtitles. For example, the video translation software pyVideoTrans integrates edge-tts
, and users can directly select it in the dubbing channels.
However, unfortunately, there is a serious problem of abuse of Microsoft TTS by domestic users, and some people even use it for commercial dubbing sales. This has led Microsoft to restrict domestic access. If used too frequently, a 403 error may occur, and only switching IPs or connecting to a stable foreign VPN can continue to be used.
So, is it possible to build a simple transfer service on a foreign server for your own use? This would not only improve stability, but also make the interface compatible with OpenAI TTS, so that it can be used directly in the OpenAI SDK.
The answer is yes. I recently took the time to create a Docker image that can be easily pulled and started on the server.
After starting, the service interface is fully compatible with OpenAI. You only need to change the API address to http://deployment_server_ip:7899/v1
to seamlessly replace OpenAI TTS. In addition, it can also be used directly in video translation software.
The following will introduce in detail how to deploy and use it:
Step 1: Purchase and open a US server
Step 2: Allow port 7899 in the firewall
Step 3: Connect to the terminal and log in to the server
Step 4: Install Docker
Step 5: Pull the edge-tts-api image and start the API service
If you already have a server and Docker installed, you can skip to step five to pull the image
Step 1: Purchase and open a US server
It is recommended to choose a server in the US region, because there are fewer or no restrictions. The server operating system can choose the Linux series, and the following takes Debian 12 as an example, and uses my personal Yecaoyun as an example. The reason for choosing it is very simple: it is cheap and relatively stable, which is enough for dubbing transfer.
If you already have a European or American Linux server, you can skip this section and read the next section directly. If not, please continue reading.
Then, select any of the top four configurations, which should be sufficient.
I personally use the 29 yuan/month configuration.
Click the "Buy Now" button to enter the configuration page. Here, select the server operating system as Debian 12
, set the server password, and keep the other settings as default.
After the payment is completed, wait a few minutes for the server to be created and started successfully. Next, you need to set the firewall to open port 7899. Only by releasing this port can you connect to the service for dubbing.
Step 2: Allow port 7899 in the firewall
If you plan to use a domain name and configure Nginx reverse proxy, you do not need to release the port. If you are not familiar with these, for the sake of simplicity, it is recommended to directly release the port.
The firewall settings interface varies depending on the server and panel. The following uses my Yecaoyun panel as an example, and other panels can be referenced. If you know how to release the port, you can skip this section and read the next section directly.
First, in "My Products and Services", click the product you just opened to enter the product information and management page.
On this page, you can find the server's IP address and password information.
Find "Firewall" under "Additional Tools" and click to open it.
Then release port 7899, as shown in the figure below:
Step 3: Connect to the terminal and log in to the server
If you already know how to connect to the terminal, or have other SSH terminals such as Xshell, you can skip this step and read the next section directly.
On the product information page, find Xterm.js Console
and click it. Then operate as shown in the figure below:
When the above figure appears, press Enter a few times.
When Login:
is displayed, enter root
after it, and then press Enter.
Then Password:
will appear, and you need to paste the password you copied (if you forget it, you can find it on the product information page).
Note: Do not use Ctrl+V
or right-click to paste when pasting, as this may cause extra spaces or line breaks to be entered, causing the password to be incorrect.
Hold down the Shift
key + Insert
key to paste the password, to prevent the password from being correct but unable to log in, and then press Enter.
The login is successful as shown in the figure below.
Step 4: Install Docker
If your server has already installed Docker or knows how to install it, you can skip this step.
Execute the following 5 commands in sequence, and pay attention to executing the next command after each command is executed successfully. These commands are only applicable to the Debian 12
series server.
After [root@xxxxxx~]#
, right-click to paste the following command, and press Enter to execute after pasting.
Command sudo apt update && sudo apt install -y apt-transport-https ca-certificates curl gnupg
Command 2: curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
Command 3: echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
Command 4: sudo apt update && sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin
Command 5: Start the Docker service. sudo systemctl start docker && sudo systemctl enable docker && sudo usermod -aG docker $USER
This command can be right-clicked to paste, and press Enter after pasting.
Step 5: Pull the edge-tts-api image and start the API service
Enter the following command to automatically pull the image and start the service. After the startup is successful, you can use it in video translation software or other tools that support OpenAI TTS.
docker run -p 7899:7899 jianchang512/edge-tts-api:latest
Press Ctrl+C
continuously to stop the service.
Note that this command will run in the foreground. If you close the terminal window, the service will stop.
You can use the following command instead, which will start the service in the background. You can safely close the terminal after execution.
docker run -d -p 7899:7899 jianchang512/edge-tts-api:latest
If there is no error, it means the startup is successful. You can open http://your_ip:7899/v1/audio/speech
in your browser to verify. If a result similar to the figure below appears, it means the startup is successful.
Use in video translation software
Please upgrade the software to v3.40 before you can use it. Upgrade download address https://pyvideotrans.com/downpackage
Open the menu and enter TTS Settings->OpenAI TTS to change the interface address to http://your_ip:7899/v1
SK can be filled in arbitrarily, as long as it is not empty. In the list of roles, use commas to separate the roles you want to use.
Available Roles
The following is a list of available roles. Please note that the text language and role must match.
Chinese pronunciation roles:
zh-HK-HiuGaaiNeural
zh-HK-HiuMaanNeural
zh-HK-WanLungNeural
zh-CN-XiaoxiaoNeural
zh-CN-XiaoyiNeural
zh-CN-YunjianNeural
zh-CN-YunxiNeural
zh-CN-YunxiaNeural
zh-CN-YunyangNeural
zh-CN-liaoning-XiaobeiNeural
zh-TW-HsiaoChenNeural
zh-TW-YunJheNeural
zh-TW-HsiaoYuNeural
zh-CN-shaanxi-XiaoniNeural
English roles:
en-AU-NatashaNeural
en-AU-WilliamNeural
en-CA-ClaraNeural
en-CA-LiamNeural
en-HK-SamNeural
en-HK-YanNeural
en-IN-NeerjaExpressiveNeural
en-IN-NeerjaNeural
en-IN-PrabhatNeural
en-IE-ConnorNeural
en-IE-EmilyNeural
en-KE-AsiliaNeural
en-KE-ChilembaNeural
en-NZ-MitchellNeural
en-NZ-MollyNeural
en-NG-AbeoNeural
en-NG-EzinneNeural
en-PH-JamesNeural
en-PH-RosaNeural
en-SG-LunaNeural
en-SG-WayneNeural
en-ZA-LeahNeural
en-ZA-LukeNeural
en-TZ-ElimuNeural
en-TZ-ImaniNeural
en-GB-LibbyNeural
en-GB-MaisieNeural
en-GB-RyanNeural
en-GB-SoniaNeural
en-GB-ThomasNeural
en-US-AvaMultilingualNeural
en-US-AndrewMultilingualNeural
en-US-EmmaMultilingualNeural
en-US-BrianMultilingualNeural
en-US-AvaNeural
en-US-AndrewNeural
en-US-EmmaNeural
en-US-BrianNeural
en-US-AnaNeural
en-US-AriaNeural
en-US-ChristopherNeural
en-US-EricNeural
en-US-GuyNeural
en-US-JennyNeural
en-US-MichelleNeural
en-US-RogerNeural
en-US-SteffanNeural
Japanese roles:
ja-JP-KeitaNeural
ja-JP-NanamiNeural
Korean roles:
ko-KR-HyunsuNeural
ko-KR-InJoonNeural
ko-KR-SunHiNeural
French roles:
fr-BE-CharlineNeural
fr-BE-GerardNeural
fr-CA-ThierryNeural
fr-CA-AntoineNeural
fr-CA-JeanNeural
fr-CA-SylvieNeural
fr-FR-VivienneMultilingualNeural
fr-FR-RemyMultilingualNeural
fr-FR-DeniseNeural
fr-FR-EloiseNeural
fr-FR-HenriNeural
fr-CH-ArianeNeural
fr-CH-FabriceNeural
German roles:
de-AT-IngridNeural
de-AT-JonasNeural
de-DE-SeraphinaMultilingualNeural
de-DE-FlorianMultilingualNeural
de-DE-AmalaNeural
de-DE-ConradNeural
de-DE-KatjaNeural
de-DE-KillianNeural
de-CH-JanNeural
de-CH-LeniNeural
Spanish roles:
es-AR-ElenaNeural
es-AR-TomasNeural
es-BO-MarceloNeural
es-BO-SofiaNeural
es-CL-CatalinaNeural
es-CL-LorenzoNeural
es-ES-XimenaNeural
es-CO-GonzaloNeural
es-CO-SalomeNeural
es-CR-JuanNeural
es-CR-MariaNeural
es-CU-BelkysNeural
es-CU-ManuelNeural
es-DO-EmilioNeural
es-DO-RamonaNeural
es-EC-AndreaNeural
es-EC-LuisNeural
es-SV-LorenaNeural
es-SV-RodrigoNeural
es-GQ-JavierNeural
es-GQ-TeresaNeural
es-GT-AndresNeural
es-GT-MartaNeural
es-HN-CarlosNeural
es-HN-KarlaNeural
es-MX-DaliaNeural
es-MX-JorgeNeural
es-NI-FedericoNeural
es-NI-YolandaNeural
es-PA-MargaritaNeural
es-PA-RobertoNeural
es-PY-MarioNeural
es-PY-TaniaNeural
es-PE-AlexNeural
es-PE-CamilaNeural
es-PR-KarinaNeural
es-PR-VictorNeural
es-ES-AlvaroNeural
es-ES-ElviraNeural
es-US-AlonsoNeural
es-US-PalomaNeural
es-UY-MateoNeural
es-UY-ValentinaNeural
es-VE-PaolaNeural
es-VE-SebastianNeural
Arabic roles:
ar-DZ-AminaNeural
ar-DZ-IsmaelNeural
ar-BH-AliNeural
ar-BH-LailaNeural
ar-EG-SalmaNeural
ar-EG-ShakirNeural
ar-IQ-BasselNeural
ar-IQ-RanaNeural
ar-JO-SanaNeural
ar-JO-TaimNeural
ar-KW-FahedNeural
ar-KW-NouraNeural
ar-LB-LaylaNeural
ar-LB-RamiNeural
ar-LY-ImanNeural
ar-LY-OmarNeural
ar-MA-JamalNeural
ar-MA-MounaNeural
ar-OM-AbdullahNeural
ar-OM-AyshaNeural
ar-QA-AmalNeural
ar-QA-MoazNeural
ar-SA-HamedNeural
ar-SA-ZariyahNeural
ar-SY-AmanyNeural
ar-SY-LaithNeural
ar-TN-HediNeural
ar-TN-ReemNeural
ar-AE-FatimaNeural
ar-AE-HamdanNeural
ar-YE-MaryamNeural
ar-YE-SalehNeural
Bengali roles:
bn-BD-NabanitaNeural
bn-BD-PradeepNeural
bn-IN-BashkarNeural
bn-IN-TanishaaNeural
Czech roles
cs-CZ-AntoninNeural
cs-CZ-VlastaNeural
Dutch roles:
nl-BE-ArnaudNeural
nl-BE-DenaNeural
nl-NL-ColetteNeural
nl-NL-FennaNeural
nl-NL-MaartenNeural
Hebrew roles:
he-IL-AvriNeural
he-IL-HilaNeural
Hindi roles:
hi-IN-MadhurNeural
hi-IN-SwaraNeural
Hungarian roles:
hu-HU-NoemiNeural
hu-HU-TamasNeural
Indonesian roles:
id-ID-ArdiNeural
id-ID-GadisNeural
Italian roles:
it-IT-GiuseppeNeural
it-IT-DiegoNeural
it-IT-ElsaNeural
it-IT-IsabellaNeural
Kazakh roles:
kk-KZ-AigulNeural
kk-KZ-DauletNeural
Malay roles:
ms-MY-OsmanNeural
ms-MY-YasminNeural
Polish roles:
pl-PL-MarekNeural
pl-PL-ZofiaNeural
Portuguese roles:
pt-BR-ThalitaNeural
pt-BR-AntonioNeural
pt-BR-FranciscaNeural
pt-PT-DuarteNeural
pt-PT-RaquelNeural
Russian roles:
ru-RU-DmitryNeural
ru-RU-SvetlanaNeural
Swedish roles:
sw-KE-RafikiNeural
sw-KE-ZuriNeural
sw-TZ-DaudiNeural
sw-TZ-RehemaNeural
Thai roles:
th-TH-NiwatNeural
th-TH-PremwadeeNeural
Turkish roles:
tr-TR-AhmetNeural
tr-TR-EmelNeural
Ukrainian roles:
uk-UA-OstapNeural
uk-UA-PolinaNeural
Vietnamese roles:
vi-VN-HoaiMyNeural
vi-VN-NamMinhNeural
Use in OpenAI sdk
Need to install the openai library pip install openai
from openai import OpenAI
client = OpenAI(api_key='12314', base_url='http://your_ip:7899/v1')
with client.audio.speech.with_streaming_response.create(
model='tts-1',
voice='zh-CN-YunxiNeural',
input='Hello, dear friends',
speed=1.0
) as response:
with open('./test.mp3', 'wb') as f:
for chunk in response.iter_bytes():
f.write(chunk)
Call directly using requests
import requests
res=requests.post('http://your_ip:7899/v1',data={"voice":"zh-CN-YunxiNeural",
"input":"Hello, dear friends",
speed=1.0 })
with open('./test.mp3', 'wb') as f:
f.write(res.content)