diff --git a/README.md b/README.md index 18757fe..253ee9c 100644 --- a/README.md +++ b/README.md @@ -74,7 +74,7 @@ Docker: `beveradb/audio-separator` 💬 To test if `audio-separator` has been successfully configured to use FFmpeg, run `audio-separator --env_info`. The log will show `FFmpeg installed`. -If you installed `audio-separator` using `conda` or `docker`, FFmpeg should already be avaialble in your environment. +If you installed `audio-separator` using `conda` or `docker`, FFmpeg should already be available in your environment. You may need to separately install FFmpeg. It should be easy to install on most platforms, e.g.: @@ -109,6 +109,17 @@ If you see the error `Failed to load library` or `cannot open shared object file You can install the CUDA 11 libraries _alongside_ CUDA 12 like so: `apt update; apt install nvidia-cuda-toolkit` +If you encounter the following messages when running on Google Colab or in another environment: +``` +[E:onnxruntime:Default, provider_bridge_ort.cc:1862 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1539 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn_adv.so.9: cannot open shared object file: No such file or directory + +[W:onnxruntime:Default, onnxruntime_pybind_state.cc:993 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Require cuDNN 9.* and CUDA 12.*. Please install all dependencies as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they're in the PATH, and that your GPU is supported. +``` +You can resolve this by running the following command: +```sh +python -m pip install ort-nightly-gpu --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/ +``` + > Note: if anyone knows how to make this cleaner so we can support both different platform-specific dependencies for hardware acceleration without a separate installation process for each, please let me know or raise a PR! ## Usage 🚀 @@ -217,63 +228,6 @@ output_files = separator.separate('audio1.wav') print(f"Separation complete! Output file(s): {' '.join(output_files)}") ``` -#### Using different models to extract different stems - -Here's an example of how you can process a single input file with multiple different models to get desired results. - -This example [came from a user]([url](https://github.com/nomadkaraoke/python-audio-separator/issues/111#issuecomment-2353780618)) who wanted the following outputs: - -- `Vocals.wav` -- `Instrumental.wav` -- `Vocals (Reverb).wav` -- `Vocals (No Reverb).wav` -- `Lead Vocals.wav` -- `Backing Vocals.wav` - -To achieve this, they used the following code, leveraging three different models in sequence and renaming the output files: - -```python -import os -from audio_separator.separator import Separator - -input = "/content/input.mp3" -output = "/content/output" - -separator = Separator(output_dir=output) - -# Vocals and Instrumental -vocals = os.path.join(output, 'Vocals.wav') -instrumental = os.path.join(output, 'Instrumental.wav') - -# Vocals with Reverb and Vocals without Reverb -vocals_reverb = os.path.join(output, 'Vocals (Reverb).wav') -vocals_no_reverb = os.path.join(output, 'Vocals (No Reverb).wav') - -# Lead Vocals and Backing Vocals -lead_vocals = os.path.join(output, 'Lead Vocals.wav') -backing_vocals = os.path.join(output, 'Backing Vocals.wav') - -# Splitting a track into Vocal and Instrumental -separator.load_model(model_filename='model_bs_roformer_ep_317_sdr_12.9755.ckpt') -voc_inst = separator.separate(input) -os.rename(os.path.join(output, voc_inst[0]), instrumental) # Rename file to “Instrumental.wav” -os.rename(os.path.join(output, voc_inst[1]), vocals) # Rename file to “Vocals.wav” - -# Applying DeEcho-DeReverb to Vocals -separator.load_model(model_filename='UVR-DeEcho-DeReverb.pth') -voc_no_reverb = separator.separate(vocals) -os.rename(os.path.join(output, voc_no_reverb[0]), vocals_no_reverb) # Rename file to “Vocals (No Reverb).wav” -os.rename(os.path.join(output, voc_no_reverb[1]), vocals_reverb) # Rename file to “Vocals (Reverb).wav” - -# Separating Back Vocals from Main Vocals -separator.load_model(model_filename='mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt') -backing_voc = separator.separate(vocals_no_reverb) -os.rename(os.path.join(output, backing_voc[0]), backing_vocals) # Rename file to “Backing Vocals.wav” -os.rename(os.path.join(output, backing_voc[1]), lead_vocals) # Rename file to “Lead Vocals.wav” -``` - -Thanks to @Bebra777228 for contributing this example! - #### Batch processing and processing with multiple models You can process multiple files without reloading the model to save time and memory. @@ -303,6 +257,27 @@ output_file_paths_5 = separator.separate('audio2.wav') output_file_paths_6 = separator.separate('audio3.wav') ``` +#### Renaming Stems + +You can rename the output files by specifying the desired names. For example: +```python +output_files = separator.separate('audio1.wav', 'stem1', 'stem2') +``` +In this case, the output file names will be: `stem1.wav` and `stem2.wav`. + +You can also rename specific stems: + +- To rename the primary stem: + ```python + output_files = separator.separate('audio1.wav', primary_output_name='stem1') + ``` + > The output files will be named: `stem1.wav` and `audio1_(Instrumental)_model_mel_band_roformer_ep_3005_sdr_11.wav` +- To rename the secondary stem: + ```python + output_files = separator.separate('audio1.wav', secondary_output_name='stem2') + ``` + > The output files will be named: `audio1_(Vocals)_model_mel_band_roformer_ep_3005_sdr_11.wav` and `stem2.wav` + ## Parameters for the Separator class - log_level: (Optional) Logging level, e.g., INFO, DEBUG, WARNING. Default: logging.INFO @@ -315,8 +290,8 @@ output_file_paths_6 = separator.separate('audio3.wav') - output_single_stem: (Optional) Output only a single stem, such as 'Instrumental' and 'Vocals'. Default: None - invert_using_spec: (Optional) Flag to invert using spectrogram. Default: False - sample_rate: (Optional) Set the sample rate of the output audio. Default: 44100 -- use_soundfile: (Optional) Use soundfile for output writing, can solve OOM issues, especially on longer audio. -- use_autocast: (Optional) Flag to use PyTorch autocast for faster inference. Do not use for CPU inference. Default: False +- use_soundfile: (Optional) Use soundfile for output writing, can solve OOM issues, especially on longer audio. +- use_autocast: (Optional) Flag to use PyTorch autocast for faster inference. Do not use for CPU inference. Default: False - mdx_params: (Optional) MDX Architecture Specific Attributes & Defaults. Default: {"hop_length": 1024, "segment_size": 256, "overlap": 0.25, "batch_size": 1, "enable_denoise": False} - vr_params: (Optional) VR Architecture Specific Attributes & Defaults. Default: {"batch_size": 1, "window_size": 512, "aggression": 5, "enable_tta": False, "enable_post_process": False, "post_process_threshold": 0.2, "high_end_process": False} - demucs_params: (Optional) Demucs Architecture Specific Attributes & Defaults. {"segment_size": "Default", "shifts": 2, "overlap": 0.25, "segments_enabled": True}