Skip to content

Official repo for paper: RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection

License

Notifications You must be signed in to change notification settings

ZhihaoZhang97/RU-AI

Repository files navigation

RU-AI

This is the official repo for paper: RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection

Requirement

The dataset is publicly avaliable at zenodo:

https://zenodo.org/records/11406538

The dataset requires at least 500GB of disk space to be fully downloaded.

The model inference requires a Nvidia GPU with at least 16GB of vRAM to run. We recommend to have NVIDIA RTX 3090, 24GB or anything above to run this project.

We highly recommend to have this package installed within a virtual environment such as conda or venv.

Environmental requirement:

  • Python >= 3.8
  • Pytorch >= 1.13.1
  • CUDA Version >= 11.6

Clone the project:

git clone https://github.com/ZhihaoZhang97/RU-AI.git

Create the virtual environment via conda and Python 3.8:

conda create -n ruai python=3.8

Activate the environment:

conda activate ruai

Move into the project path:

cd RU-AI

Install the dependencies:

pip3 install -r requirements.txt

Data Sample

We provide a quick tutorial on how to download and inspect the dataset on the data-example.ipynb notebook.

You can also directly run the follwoing code to download smaple data sourced for flickr8k:

python ./download_flickr.py

You can also download all the data by running the following code.

Please note the whole dataset is over 157GB in compression and could take up to 500GB after decompression.

It will take a while for downloading, the actual speed depends on your internet.

python ./download_all.py

You can also go to ./data to manually check the data after downloading.

Here is the directory tree after downloading all the data:

├── audio
│   ├── coco
│   │   ├── efficientspeech
│   │   ├── real
│   │   ├── styletts2
│   │   ├── vits
│   │   ├── xtts2
│   │   └── yourtts
│   ├── flickr8k
│   │   ├── efficientspeech
│   │   ├── real
│   │   ├── styletts2
│   │   ├── vits
│   │   ├── xtts2
│   │   └── yourtts
│   └── place
│       ├── efficientspeech
│       ├── real
│       ├── styletts2
│       ├── vits
│       ├── xtts2
│       └── yourtts
├── image
│   ├── coco
│   │   ├── real
│   │   ├── stable-diffusion-images-absolutereality-remove-black
│   │   ├── stable-diffusion-images-epicrealism-remove-black
│   │   ├── stable-diffusion-images-v1-5
│   │   ├── stable-diffusion-images-v6-0-remove-black
│   │   └── stable-diffusion-images-xl-v3-0-remove-black
│   ├── flickr8k
│   │   ├── real
│   │   ├── stable-diffusion-images-absolutereality
│   │   ├── stable-diffusion-images-epicrealism
│   │   ├── stable-diffusion-images-v1-5
│   │   ├── stable-diffusion-images-v6-0
│   │   └── stable-diffusion-images-xl-v3-0
│   └── place
│       ├── real
│       ├── stable-diffusion-images-absolutereality-remove-black
│       ├── stable-diffusion-images-epicrealism-remove-black
│       ├── stable-diffusion-images-v1-5
│       ├── stable-diffusion-images-v6-0-remove-black
│       └── stable-diffusion-images-xl-v3-0-remove-black
└── text
    ├── coco
    ├── flickr8k
    └── place

Model Inference

Before model inference, replace image_data_paths, audio_data_paths, text_data in the infer_imagebind_model.py and infer_languagebind_model.py files with real data / data paths

imagebind based model

python infer_imagebind_model.py

languagebind based model

python infer_languagebind_model.py

Reference

We are appreciated the open-source community for the datasets and the models.

Microsoft COCO: Common Objects in Context

Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics

Learning Deep Features for Scene Recognition using Places Database

Text-Free Image-to-Speech Synthesis Using Learned Segmental Units

Unsupervised Learning of Spoken Language with Visual Context

Learning Word-Like Units from Joint Audio-Visual Analysis

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

ImageBind: One Embedding Space To Bind Them All

Citation

If found our dataset or research useful, please cite:

@misc{huang2024ruai,
      title={RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection}, 
      author={Liting Huang and Zhihao Zhang and Yiran Zhang and Xiyue Zhou and Shoujin Wang},
      year={2024},
      eprint={2406.04906},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

Official repo for paper: RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages