- This is the official repository of Sounding Video Generator (SVG, TMM version, arxiv version), which is the first unified framework for Text-to-Sounding-Video (T2SV) generation, as is known to us.
- The latest version of the AudioSet-Cap dataset is VALOR-1M, which contains more videos and annotations. The AudioSet-Cap test set could be found at
/assets/AudioSet-Cap_test.json
.
Click the picture to jump to play the sounding videos. More sampled videos and audios could be found in assets.