ChatVID

💬 Chat about anything on any video! 🎥

Authors:
Yibin Yan🤝, BUPT Yiqin Wang🤝, Tsinghua University
Yansong Tang, Tsinghua-Berkeley Shenzhen Institute
(🤝 = equal contribution, names listed alphabetically)
This work is done during Yibin and Yiqin's internship with Prof. Tang.

Try our demo🤗

The demo is now paused. If you do want to try it out, you can reach out Yiqin with [email protected]

Intro to ChatVID

⭐ ChatVID combines the knowledge from Large Language Models and the sensing ablity of Vision Models and Audio Models.

⭐ ChatVID demonstrate a powerful capability to talk about anything in the video.

⭐ Please give us a Star! For any questions or suggestions, feel free to drop Yiqin an email at [email protected] or open an issue.

Highlights 🔥

🔍 Leverage the power of Large Language Models, Vision Models, and Audio Models to enable conversations about videos.
🤖 Utilize Vicuna as the Large Language Model for understanding user queries and responses.
📷 Incorporate state-of-the-art Vision Models like BLIP2, GRiT, and Vid2Seq for visual understanding and analysis.
🎤 Employ Whisper as an Audio Model to process audio content within videos.
💬 Enable users to have conversations and discussions about any aspect of a video.
🚀 Enhance the overall video-watching experience by providing an interactive and engaging platform.
🚗 ChatVID with Vicuna-7B (8bit) is able to run with a Nvidia GPU with 24G RAM, and 8G CPU RAM.
🎥 ChatVID needs an extra 10G CPU RAM when using Vid2Seq.

Gradio Example ✨

Install Instructions 💻

pip install -r pre-requirements.txt
pip install -r requirements.txt
pip install -r extra-requirements.txt # optional, only for vid2seq

You will also need to install ffmpeg for Whisper. Note that if Whisper encounters permission errors, you may need to specify environment variable DATA_GYM_CACHE_DIR='/YourRootDir/ChatVID/.cache', a writable cache directory.

Setting Up Checkpoints 📦💼

Grit Checkpoints 🚀

Put Grit into pretrained_models folder.

Vicuna Weights 🦙

ChatVID uses frozen Vicuna 7B and 13B models. Please first follow the instructions to prepare Vicuna v1.1 weights. Then modify the vicuna.model_path in the Infer Config to the folder that contains Vicuna weights.

Vid2Seq Checkpoints (Optional) 🎥📊

Prepare CLIP ViT-L/14 Checkpoint for feature extraction in Vid2Seq. Get CLIP ViT-L/14 Checkpoint. Specify the vid2seq.clip_path in the Infer Config to the checkpoint path. vid2seq.output_path is used to store the generated TFRecords and can be specified to any writable directory. vid2seq.work_dir is the Flax's working directory and can be specified to any writable directory.
Prepare Vid2Seq ActivityNet Checkpoint Get the Vid2Seq ActivityNet Checkpoint. And then rename it as checkpoint_200001. After that, change the vid2seq.checkpoint_path in the Infer Config to the folder directory where contains the checkpoint.

File Structure

ChatVID/
|__config/
    |__...
|__model/
    |__...
|__scenic/
    |__...
|__simclr/
    |__...
|__pretrained_models/
    |__grit_b_densecap_objectdet.pth
|__vicuna-7b/
    |__pytorch_model-00001-of-00002.bin
    |__pytorch_model-00002-of-00002.bin
    |__...
|__vid2seq_ckpt/
    |__checkpoint_200001
|__clip_ckpt/
    |__ViT-L-14.pt
|__app.py
|__README.md
|__pre-requirements.txt
|__requirements.txt
|__extra-requirements.txt
|__LICENSE

Gradio WebUI Usage 🌐

# change all the abs path in config/infer.yaml
python app.py

Acknowledgment

This work is based on Vicuna, BLIP-2, GRiT, Vid2Seq, Whisper. Thanks for their great work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChatVID

💬 Chat about anything on any video! 🎥

Try our demo🤗

Intro to ChatVID

Highlights 🔥

Gradio Example ✨

Install Instructions 💻

Setting Up Checkpoints 📦💼

Grit Checkpoints 🚀

Vicuna Weights 🦙

Vid2Seq Checkpoints (Optional) 🎥📊

File Structure

Gradio WebUI Usage 🌐

Acknowledgment

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
config		config
examples		examples
model		model
scenic		scenic
simclr		simclr
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
extra-requirements.txt		extra-requirements.txt
pre-requirements.txt		pre-requirements.txt
requirements.txt		requirements.txt

License

InvincibleWyq/ChatVID

Folders and files

Latest commit

History

Repository files navigation

ChatVID

💬 Chat about anything on any video! 🎥

Try our demo🤗

Intro to ChatVID

Highlights 🔥

Gradio Example ✨

Install Instructions 💻

Setting Up Checkpoints 📦💼

Grit Checkpoints 🚀

Vicuna Weights 🦙

Vid2Seq Checkpoints (Optional) 🎥📊

File Structure

Gradio WebUI Usage 🌐

Acknowledgment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages