Music-to-Video

Creating music videos from song and their lyrics using VQGANs + CLIP + WAV2CLIP + Laplacian Segmentation. You can see the Google Colab Version here

Description

This study is about making an improvement on the existing Music to Video generation framework, MUSIC2VIDEO: AUTOMATIC GENERATION OF MUSIC VIDEO WITH FUSION OF AUDIO AND TEXT. The original paper's github can be found here. The framework uses text (lyrics) and audio (song) to generate images, which are converted to videos.
In terms of making improvements to the existing framework, the two concepts are tackled in different aspects. The original video was very stagnant, so I decided to add aesthetic modifiers and video effects like zooming and panning. While experimenting with these effects, it looked random and did not add meaning to the video. Therefore I segmented the audio to understand the repetitive parts of a song, using this information to create patterns with these effects in sync with the rhythm and beats of the song.

Outputs

Evaluation & Survey

A survey was conducted to understand if this survey made any improvements. The survey can be found here and results of the survey can be found here - graphical and here - csv. A total of 54 people of different ages, from various countries and professions, participated in this survey.

Getting Started

Dependencies

VQGAN - Transformers
CLIP
WAV2CLIP
Laplacian Segmentation
Librosa
Pytorch
Stegano
Pandas
sklearn
Matplotlib
View this notebook cell to see all dependencies

Installing

This is Google Colab Notebook that can be imported into your colab or jupyter environment
Make sure to download the lyrics (csv) and the song (mp3) and place them in an appropriate directory

Executing program

The notebook provides a detailed step by step to run the code
Run it as it is once, then feel free to experiment

Help

If you've found a new bug, go ahead and create a new GitHub issue. Be sure to include as much information as possible so I can reproduce the bug.

Authors

The VQGAN+CLIP (z+quantize method) notebook this was based on is by Katherine Crowson (https://github.com/crowsonkb, https://twitter.com/RiversHaveWings). The original BigGAN + CLIP method was made by https://twitter.com/advadnoun. Translated into Spanish and added explanations, and modifications by Eleiber#8347, and the friendly interface was made thanks to Abulafia#3734. Translated back into English, and zoom, pan, rotation, and keyframes features by Chigozie Nri (https://github.com/chigozienri, https://twitter.com/chigozienri). Some UI improvements were made by Justin John (https://github.com/justinjohn0306). A linked helper spreadsheet for creating parameter strings is by Kendrick Feller (https://twitter.com/EphemeralInc) Music2Video framework was first released by researchers at KAIST: MUSIC2VIDEO: AUTOMATIC GENERATION OF MUSIC VIDEO WITH FUSION OF AUDIO AND TEXT. The paper can be found here: https://arxiv.org/pdf/2201.03809.pdf. The music Segmentation is based on the paper: ANALYZING SONG STRUCTURE WITH SPECTRAL CLUSTERING. Link: https://brianmcfee.net/papers/ismir2014_spectral.pdf

Version History

0.2
- Added documnetation
- Various bug fixes and optimizations
0.1
- Initial Release

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Acknowledgments

Prof Simon Colton
Professor of Computational Creativity, Games and Artificial Intelligence

Dr Mike Cook
Research Fellow, Computational Creativity

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
Music_to_video.ipynb		Music_to_video.ipynb
README.md		README.md
imagenet_song.mp3		imagenet_song.mp3
imagenet_song_lyrics.csv		imagenet_song_lyrics.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Music-to-Video

Description

Outputs

Evaluation & Survey

Getting Started

Dependencies

Installing

Executing program

Help

Authors

Version History

License

Acknowledgments

About

Releases

Packages

Languages

License

ishaan27chaturvedi/Music-to-Video

Folders and files

Latest commit

History

Repository files navigation

Music-to-Video

Description

Outputs

Evaluation & Survey

Getting Started

Dependencies

Installing

Executing program

Help

Authors

Version History

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages