In audio-to-text tasks, background noise or accompaniment can also affect the recognition effect. To obtain more accurate results, it is also necessary to remove background accompaniment from the audio in advance.
Recommended 2 tools for separating human voice from background sound
One is vocal-separate: A local offline tool for separating human voice from background sound based on spleeter. There is a pre-packaged version for Windows, which can be used by unzipping and double-clicking. Source code deployment is required for Mac/Linux. Chinese interface, very easy to use, supports direct processing of videos, and fast speed.
The second is Ultimate Vocal Remover: This is the desktop GUI version of UVR5. On Windows, it needs to be installed on the C drive, otherwise problems are prone to occur. English interface, many options, relatively complex operation, but also more powerful and better results.
vocal-separate Installation and Usage
1. First go here to download the pre-packaged version on Windows, other systems pull the source code deployment. https://github.com/jianchang512/vocal-separate/releases
2. After downloading, unzip it, double-click start.exe
, wait for the browser page to open automatically, if a similar error message appears in the figure below, don't worry, this is just a reminder that GPU acceleration cannot be performed, which does not affect use
After successful startup, the following browser page will open
3. As shown in the figure above, drag and drop or click to upload the audio or video that you want to separate the individual vocals from. The video will be automatically converted to audio and then processed after uploading.
Select "2stems" from the model to separate the uploaded file into 2 files: vocals and other sounds.
Of course, you can also choose 4stems and 5stems models. In addition to separating vocals, they will also subdivide other sounds into files such as "drum sound" and "bass sound". Generally, only 2stems is used.
You can listen to the separation result on the webpage, click Download or go directly to the displayed separation result directory to find the separation file. The vocal file name is vocals.wav, and the other sound file name is accompaniment.wav
It's that simple.
Ultimate Vocal Remover Installation and Usage
1. First go here https://github.com/Anjok07/ultimatevocalremovergui/releases/tag/v5.6 to download
The Windows version can also be downloaded directly from this link. After downloading, double-click the exe file and click next all the way to complete the installation https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_v5.6.0_setup.exe
2. After the installation is complete, double-click the desktop icon to start.
3. As shown in the figure below, select the audio file to be processed, and set the output result directory, select the model to be processed, bit rate and other options. Except for "Select Input" and "Select Output", all others are optional and can be kept as default.
"Select Input": Click it to select the audio file to be processed.
"Select Output": Click it to select where to save the processed file.
"CHOOSE PROCESS MEHTODS": Select the processing method, the default is MDX-Net, this effect should be the best, keep the default.
"CHOOSE MDX-NET MODEL": The model to be used corresponding to the above method, if it is not the "MDX-Net" method, you need to download the model separately.
"Start Processing": The start execution button after selecting, click it to start the separation operation, and wait for the prompt to complete.