Skip to content

Latest commit

 

History

History
105 lines (90 loc) · 4.75 KB

PREPARE_DATASET.md

File metadata and controls

105 lines (90 loc) · 4.75 KB

Preparing Training & Evaluation Dataset

🗃️ Training Dataset

 

  • 📜 Instructions: Download all of our video instructions from 🤗 SNUMPR/vlm_rlaif_datasets
    Dataset Usage Filename Source of Videos
    SFT (short) SFT_short.json All
    SFT (long) SFT_long.json All
    Preference dataset (RM) RM_13b_v1_dataset_39k.json ANet
    PPO init PPO_init.json ANet
    RLAIF RL_data.json ANet

 

  • 🎥 Videos: Download source videos following the instructions below, and then extract 50 frames per each video to train the model.

    1. Video-ChatGPT Instruction Dataset - ActivityNet videos:

    2. Video Localized Narratives Dataset

      • See download instructions for the original dataset
      • Download source video from four datasets; OoPs, OVIS, kinetics400 (UVO), kinetics
      • Extract 50 frames per each video into OOPs_50frames, OVIS_50frames, kinetics400_50frames, kinetics_50frames
    3. How2QA

    4. NeXTQA

      • Download Google Drive link provided by original authors to download video files
      • Extract 50 frames per each video into nextqa_50frames
    5. WebVid

      • Follow the official WebVid dataset README to download the videos.
      • Extract 50 frames per each video into webvid_50frames
# 📁  Training data folder structure
TRAIN_DATA_ROOT # (playground/data/train_dataset in default)
├── instructions
└── videos
    ├── anet_vidchatgpt_50frames
    ├── OOPs_50frames
    ├── OVIS_50frames
    ├── kinetics400_50frames
    ├── kinetics_50frames
    ├── how2qa_50frames
    ├── nextqa_50frames
    └── webvid_50frames
// Example structure
{
    'id': 'sampleid',
    'src_data': 'original data source',
    'conversations': [
        {'role': 'human', 'value': ''},
        {'role': 'gpt', 'value': ''}
    ]
    'images': [
        'video_dir/image_01.jpg',
        'video_dir/image_02.jpg',
            ...
    ]
}

   

🗃️ Evaluation Dataset

# 📁  Evaluation folder structure
EVAL_DATA_ROOT # (playground/data/eval_dataset in default)
├── zeroshotqa
│   ├── annotations
│   └── frames
│       ├── anet
│       ├── msvd
│       └── msrvtt
└── videogenerativebench
    ├── annotations
    └── frames

Zero-shot QA

  • 🤗 SNUMPR/vlm_rlaif_eval_datasets Download our preprocessed zero-shot QA benchmark from this link.
  • For original videos and test split, follow instructions from Video-LLaVA to download the Zero-shot QA dataset.

Video Generative Benchmark