Skip to content

Attempt to fully automate the creation of karaoke music videos, using open source tools and AI (e.g. Whisper & MDX-Net)

Notifications You must be signed in to change notification settings

nomadkaraoke/karaoke-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KaraokeHunt: Karaoke video generator

Fully automated creation of acceptable karaoke music videos from any music on YouTube, using open source tools and AI (e.g. Whisper and MDX-Net)

PyPI version

Context

This is one experimental tool as part of the journey towards implementing the full vision for KaraokeHunt (https://karaokehunt.com).

Some of the other components include:

Idea steps

  • Fetch the requested YouTube video using yt-dlp and extract the audio to wav using ffmpeg
  • Run that audio through an ML-based vocal isolation model tuned for karaoke (e.g. UVR-MDX-NET Karaoke 2 to get high quality instrumental audio without lead vocals but retaining backing vocals
  • Run the lead vocal track through whisper-timestamped to generate a time-synced lyrics file
  • Correct the detected lyrics by fetching lyrics from a human-input source (e.g. musicxmatch/spotify using syrics, genius using lyrics-from-genius and attempting to match up segments with the whisper-heard lyrics whilst maintaining timestamps
    • Potentially also consider splitting words by syllable (e.g. using python-syllables and attempting to guess the sub-word timestamps
  • Generate a new video file using the instrumental audio and a background image, with the synced lyrics “burned” into the video at the correct timestamps
    • Lots of scope to make this really nice, e.g. adjusting kerning dynamically to fit longer lines on one screen, but also lots of gotchas e.g. super long lines needing to be split at a reasonable place
  • Publish this video to YouTube

About

Attempt to fully automate the creation of karaoke music videos, using open source tools and AI (e.g. Whisper & MDX-Net)

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages