Preparing Training & Evaluation Dataset

🗃️ Training Dataset

Note Our Dataset is built upon four sources of datasets.
1. Video-ChatGPT Video Instruction Dataset
  - ActivityNet, WebVid videos
  - 100K instructions
2. Video Localized Narratives Dataset
3. How2QA
4. NextQA
5. WebVid

📜 Instructions: Download all of our video instructions from 🤗 SNUMPR/vlm_rlaif_datasets

Dataset Usage	Filename	Source of Videos
SFT (short)	SFT_short.json	All
SFT (long)	SFT_long.json	All
Preference dataset (RM)	RM_13b_v1_dataset_39k.json	ANet
PPO init	PPO_init.json	ANet
RLAIF	RL_data.json	ANet

🎥 Videos: Download source videos following the instructions below, and then extract 50 frames per each video to train the model.
1. Video-ChatGPT Instruction Dataset - ActivityNet videos:
  - Frames (🤗 SNUMPR/vlm_rlaif_train_anet_frames): Our version of preprocessed videos, extracted 50 frames per each video
  - Videos: video as mp4 format from original paper
2. Video Localized Narratives Dataset
  - See download instructions for the original dataset
  - Download source video from four datasets; OoPs, OVIS, kinetics400 (UVO), kinetics
  - Extract 50 frames per each video into OOPs_50frames, OVIS_50frames, kinetics400_50frames, kinetics_50frames
3. How2QA
  - See download instructions to download videos
  - Extract 50 frames per each video into how2qa_50frames
4. NeXTQA
  - Download Google Drive link provided by original authors to download video files
  - Extract 50 frames per each video into nextqa_50frames
5. WebVid
  - Follow the official WebVid dataset README to download the videos.
  - Extract 50 frames per each video into webvid_50frames

# 📁  Training data folder structure
TRAIN_DATA_ROOT # (playground/data/train_dataset in default)
├── instructions
└── videos
    ├── anet_vidchatgpt_50frames
    ├── OOPs_50frames
    ├── OVIS_50frames
    ├── kinetics400_50frames
    ├── kinetics_50frames
    ├── how2qa_50frames
    ├── nextqa_50frames
    └── webvid_50frames

// Example structure
{
    'id': 'sampleid',
    'src_data': 'original data source',
    'conversations': [
        {'role': 'human', 'value': ''},
        {'role': 'gpt', 'value': ''}
    ]
    'images': [
        'video_dir/image_01.jpg',
        'video_dir/image_02.jpg',
            ...
    ]
}

🗃️ Evaluation Dataset

# 📁  Evaluation folder structure
EVAL_DATA_ROOT # (playground/data/eval_dataset in default)
├── zeroshotqa
│   ├── annotations
│   └── frames
│       ├── anet
│       ├── msvd
│       └── msrvtt
└── videogenerativebench
    ├── annotations
    └── frames

Zero-shot QA

🤗 SNUMPR/vlm_rlaif_eval_datasets Download our preprocessed zero-shot QA benchmark from this link.
For original videos and test split, follow instructions from Video-LLaVA to download the Zero-shot QA dataset.

Video Generative Benchmark

Download evaluation dataset & videos for zero-shot question answering from Video-ChatGPT Qualitative Evaluation.
- Videos, Descriptions
- Extract 50 frames per each videos

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PREPARE_DATASET.md

PREPARE_DATASET.md

Preparing Training & Evaluation Dataset

🗃️ Training Dataset

🗃️ Evaluation Dataset

Zero-shot QA

Video Generative Benchmark

Files

PREPARE_DATASET.md

Latest commit

History

PREPARE_DATASET.md

File metadata and controls

Preparing Training & Evaluation Dataset

🗃️ Training Dataset

🗃️ Evaluation Dataset

Zero-shot QA

Video Generative Benchmark