What is your best recommendation for a karaoke model? #82

l2aelba · 2024-06-25T18:24:36Z

l2aelba
Jun 25, 2024

I am currently looking for community recommendations on the best karaoke models.

Could you please share your favorite model for creating karaoke files,
including details on why you like it and your preferred settings?

Example: (me)

separator = Separator(
    output_single_stem='Instrumental',
    output_format='MP3',
    sample_rate=44100,
    mdx_params={"hop_length": 1024, "segment_size": 256, "overlap": 0.25, "batch_size": 1, "enable_denoise": True}
)
separator.load_model(model_filename='model_bs_roformer_ep_317_sdr_12.9755.ckpt')

Roformer model because this issue
MP3 because the file size
mdx_params (default) but just enable_denoise)

What about you?
Thanks for sharing!

Answered by beveradb

Jun 25, 2024

So, I actually make karaoke videos every day, and I built this helper tool to automate part of my process:
https://github.com/karaokenerds/karaoke-prep

and, because I'm pretty much the only person using it, I tend to update the CLI to default to my own preferred settings whenever those change 😅
so, if you see here: https://github.com/karaokenerds/karaoke-prep/blob/main/karaoke_prep/utils/prep_cli.py#L62

for every track I start making, when I run karaoke-prep it basically runs audio-separator 4 times, once for each of these models, all using default settings:

model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt
UVR_MDXNET_KARA_2.onnx
2_HP-UVR.pth
MDX23C-8KFFT-InstVoc_HQ_2.ckpt

I choose those…

View full answer

beveradb · 2024-06-25T18:37:34Z

beveradb
Jun 25, 2024
Maintainer

So, I actually make karaoke videos every day, and I built this helper tool to automate part of my process:
https://github.com/karaokenerds/karaoke-prep

and, because I'm pretty much the only person using it, I tend to update the CLI to default to my own preferred settings whenever those change 😅
so, if you see here: https://github.com/karaokenerds/karaoke-prep/blob/main/karaoke_prep/utils/prep_cli.py#L62

for every track I start making, when I run karaoke-prep it basically runs audio-separator 4 times, once for each of these models, all using default settings:

model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt
UVR_MDXNET_KARA_2.onnx
2_HP-UVR.pth
MDX23C-8KFFT-InstVoc_HQ_2.ckpt

I choose those because:

those are 4 different architectures, and different architectures work better with different tracks / qualities / recording technologies etc.
the KARA_2 MDX model often does a great job of leaving background vocals in, which is nice for some tracks
sometimes I want the cleanest pure instrumental for one part of the song (which the InstVoc_HQ_2 MDXC model tends to provide), but for another part of the song I'll want to keep the background vocals in, so I sometimes splice bits of the output from one model with other sections from another model, using Audacity
sometimes the mel_band_roformer model provides amazingly clean results with a wider retained frequency range, but sometimes it sounds a little bit artificial so I don't default to it as much as I thought I would.

There is no single model which is the best for all input audio! It really does depend quite a lot on the track you're working on.

1 reply

l2aelba Jun 25, 2024
Author

Thanks for your sharing, I need to try UVR_MDXNET_KARA_2.onnx now 👍
Could you add any details about the pros and cons of this model?

As I tried so far the best about this model is the background vocals (chorus) and it's very fast

beveradb · 2024-06-25T18:41:36Z

beveradb
Jun 25, 2024
Maintainer

Oh, and I always use FLAC for the input audio whenever I can find a lossless input audio for the track I'm making, and I always use FLAC as the output format because I want to avoid quality loss by re-encoding multiple times with lossy formats.

Because for my use case I'm generating videos in H264/AAC MP4 format for the highest compatibility with video players / youtube etc. - whatever audio I output will be re-encoded to AAC at least once anyway.

Avoiding multiple lossy encodings is vastly preferably for audio quality compared to encoding to e.g. MP3 and then re-encoding that lossy file to AAC afterwards.

There's a bit more explanation here: https://interviewfor.red/en/transcodes.html and here: https://interviewfor.red/en/spectrals.html

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is your best recommendation for a karaoke model? #82

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

What is your best recommendation for a karaoke model? #82

l2aelba Jun 25, 2024

Replies: 2 comments · 1 reply

beveradb Jun 25, 2024 Maintainer

l2aelba Jun 25, 2024 Author

beveradb Jun 25, 2024 Maintainer

l2aelba
Jun 25, 2024

Replies: 2 comments 1 reply

beveradb
Jun 25, 2024
Maintainer

l2aelba Jun 25, 2024
Author

beveradb
Jun 25, 2024
Maintainer