Skip to content

Commit

Permalink
Update README.md (#144)
Browse files Browse the repository at this point in the history
  • Loading branch information
Bebra777228 authored Nov 4, 2024
1 parent ee085dc commit 5fac6d2
Showing 1 changed file with 35 additions and 60 deletions.
95 changes: 35 additions & 60 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ Docker: `beveradb/audio-separator`

💬 To test if `audio-separator` has been successfully configured to use FFmpeg, run `audio-separator --env_info`. The log will show `FFmpeg installed`.

If you installed `audio-separator` using `conda` or `docker`, FFmpeg should already be avaialble in your environment.
If you installed `audio-separator` using `conda` or `docker`, FFmpeg should already be available in your environment.

You may need to separately install FFmpeg. It should be easy to install on most platforms, e.g.:

Expand Down Expand Up @@ -109,6 +109,17 @@ If you see the error `Failed to load library` or `cannot open shared object file
You can install the CUDA 11 libraries _alongside_ CUDA 12 like so:
`apt update; apt install nvidia-cuda-toolkit`

If you encounter the following messages when running on Google Colab or in another environment:
```
[E:onnxruntime:Default, provider_bridge_ort.cc:1862 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1539 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn_adv.so.9: cannot open shared object file: No such file or directory
[W:onnxruntime:Default, onnxruntime_pybind_state.cc:993 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Require cuDNN 9.* and CUDA 12.*. Please install all dependencies as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they're in the PATH, and that your GPU is supported.
```
You can resolve this by running the following command:
```sh
python -m pip install ort-nightly-gpu --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/
```

> Note: if anyone knows how to make this cleaner so we can support both different platform-specific dependencies for hardware acceleration without a separate installation process for each, please let me know or raise a PR!
## Usage 🚀
Expand Down Expand Up @@ -217,63 +228,6 @@ output_files = separator.separate('audio1.wav')
print(f"Separation complete! Output file(s): {' '.join(output_files)}")
```
#### Using different models to extract different stems
Here's an example of how you can process a single input file with multiple different models to get desired results.
This example [came from a user]([url](https://github.com/nomadkaraoke/python-audio-separator/issues/111#issuecomment-2353780618)) who wanted the following outputs:
- `Vocals.wav`
- `Instrumental.wav`
- `Vocals (Reverb).wav`
- `Vocals (No Reverb).wav`
- `Lead Vocals.wav`
- `Backing Vocals.wav`
To achieve this, they used the following code, leveraging three different models in sequence and renaming the output files:
```python
import os
from audio_separator.separator import Separator

input = "/content/input.mp3"
output = "/content/output"

separator = Separator(output_dir=output)

# Vocals and Instrumental
vocals = os.path.join(output, 'Vocals.wav')
instrumental = os.path.join(output, 'Instrumental.wav')

# Vocals with Reverb and Vocals without Reverb
vocals_reverb = os.path.join(output, 'Vocals (Reverb).wav')
vocals_no_reverb = os.path.join(output, 'Vocals (No Reverb).wav')

# Lead Vocals and Backing Vocals
lead_vocals = os.path.join(output, 'Lead Vocals.wav')
backing_vocals = os.path.join(output, 'Backing Vocals.wav')

# Splitting a track into Vocal and Instrumental
separator.load_model(model_filename='model_bs_roformer_ep_317_sdr_12.9755.ckpt')
voc_inst = separator.separate(input)
os.rename(os.path.join(output, voc_inst[0]), instrumental) # Rename file to “Instrumental.wav”
os.rename(os.path.join(output, voc_inst[1]), vocals) # Rename file to “Vocals.wav”

# Applying DeEcho-DeReverb to Vocals
separator.load_model(model_filename='UVR-DeEcho-DeReverb.pth')
voc_no_reverb = separator.separate(vocals)
os.rename(os.path.join(output, voc_no_reverb[0]), vocals_no_reverb) # Rename file to “Vocals (No Reverb).wav”
os.rename(os.path.join(output, voc_no_reverb[1]), vocals_reverb) # Rename file to “Vocals (Reverb).wav”

# Separating Back Vocals from Main Vocals
separator.load_model(model_filename='mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt')
backing_voc = separator.separate(vocals_no_reverb)
os.rename(os.path.join(output, backing_voc[0]), backing_vocals) # Rename file to “Backing Vocals.wav”
os.rename(os.path.join(output, backing_voc[1]), lead_vocals) # Rename file to “Lead Vocals.wav”
```
Thanks to @Bebra777228 for contributing this example!
#### Batch processing and processing with multiple models
You can process multiple files without reloading the model to save time and memory.
Expand Down Expand Up @@ -303,6 +257,27 @@ output_file_paths_5 = separator.separate('audio2.wav')
output_file_paths_6 = separator.separate('audio3.wav')
```
#### Renaming Stems
You can rename the output files by specifying the desired names. For example:
```python
output_files = separator.separate('audio1.wav', 'stem1', 'stem2')
```
In this case, the output file names will be: `stem1.wav` and `stem2.wav`.
You can also rename specific stems:
- To rename the primary stem:
```python
output_files = separator.separate('audio1.wav', primary_output_name='stem1')
```
> The output files will be named: `stem1.wav` and `audio1_(Instrumental)_model_mel_band_roformer_ep_3005_sdr_11.wav`
- To rename the secondary stem:
```python
output_files = separator.separate('audio1.wav', secondary_output_name='stem2')
```
> The output files will be named: `audio1_(Vocals)_model_mel_band_roformer_ep_3005_sdr_11.wav` and `stem2.wav`
## Parameters for the Separator class
- log_level: (Optional) Logging level, e.g., INFO, DEBUG, WARNING. Default: logging.INFO
Expand All @@ -315,8 +290,8 @@ output_file_paths_6 = separator.separate('audio3.wav')
- output_single_stem: (Optional) Output only a single stem, such as 'Instrumental' and 'Vocals'. Default: None
- invert_using_spec: (Optional) Flag to invert using spectrogram. Default: False
- sample_rate: (Optional) Set the sample rate of the output audio. Default: 44100
- use_soundfile: (Optional) Use soundfile for output writing, can solve OOM issues, especially on longer audio.
- use_autocast: (Optional) Flag to use PyTorch autocast for faster inference. Do not use for CPU inference. Default: False
- use_soundfile: (Optional) Use soundfile for output writing, can solve OOM issues, especially on longer audio.
- use_autocast: (Optional) Flag to use PyTorch autocast for faster inference. Do not use for CPU inference. Default: False
- mdx_params: (Optional) MDX Architecture Specific Attributes & Defaults. Default: {"hop_length": 1024, "segment_size": 256, "overlap": 0.25, "batch_size": 1, "enable_denoise": False}
- vr_params: (Optional) VR Architecture Specific Attributes & Defaults. Default: {"batch_size": 1, "window_size": 512, "aggression": 5, "enable_tta": False, "enable_post_process": False, "post_process_threshold": 0.2, "high_end_process": False}
- demucs_params: (Optional) Demucs Architecture Specific Attributes & Defaults. {"segment_size": "Default", "shifts": 2, "overlap": 0.25, "segments_enabled": True}
Expand Down

0 comments on commit 5fac6d2

Please sign in to comment.