Skip to content
/ SVG Public

Sounding Video Generator (SVG) is the first unified framework for text-guided video-audio generation.

License

Notifications You must be signed in to change notification settings

jwliu-cc/SVG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 

Repository files navigation

Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation

  • This is the official repository of Sounding Video Generator (SVG, TMM version, arxiv version), which is the first unified framework for Text-to-Sounding-Video (T2SV) generation, as is known to us.
  • The latest version of the AudioSet-Cap dataset is VALOR-1M, which contains more videos and annotations. The AudioSet-Cap test set could be found at /assets/AudioSet-Cap_test.json.

Sounding Video Samples

Click the picture to jump to play the sounding videos. More sampled videos and audios could be found in assets.

Input Text Generated Result
The grass was green, with blue sky and white clouds, and the wind.
A man in a blue shirt was playing the guitar.
A woman with long hair sang in the room.
A man in a suit and glasses speaks indoors.
In the game, a yellow car roars along the road.
In the music, white text plays in front of a black background.

About

Sounding Video Generator (SVG) is the first unified framework for text-guided video-audio generation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published