Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
MoayedHajiAli committed Oct 24, 2024
1 parent 0268089 commit cf191e3
Show file tree
Hide file tree
Showing 6 changed files with 38 additions and 6 deletions.
4 changes: 3 additions & 1 deletion AutoCap/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
[![arXiv](ARXIV ICON)](ARXIV LINK)

[![Project Page](https://img.shields.io/badge/Project-Page-green.svg)](https://snap-research.github.io/GenAU) [![Arxiv](https://img.shields.io/badge/arxiv-2406.19388-b31b1b)](https://arxiv.org/abs/2406.19388) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/taming-data-and-transformers-for-audio-1/audio-captioning-on-audiocaps)](https://paperswithcode.com/sota/audio-captioning-on-audiocaps?p=taming-data-and-transformers-for-audio-1)[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/taming-data-and-transformers-for-audio-1/audio-generation-on-audiocaps)](https://paperswithcode.com/sota/audio-generation-on-audiocaps?p=taming-data-and-transformers-for-audio-1)


# AutoCap inference, training and evaluation
- [Inference](#inference)
Expand Down
3 changes: 2 additions & 1 deletion GenAU/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
<!-- [![arXiv](ARXIV ICON)](ARXIV LINK) -->

[![Project Page](https://img.shields.io/badge/Project-Page-green.svg)](https://snap-research.github.io/GenAU) [![Arxiv](https://img.shields.io/badge/arxiv-2406.19388-b31b1b)](https://arxiv.org/abs/2406.19388) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/taming-data-and-transformers-for-audio-1/audio-captioning-on-audiocaps)](https://paperswithcode.com/sota/audio-captioning-on-audiocaps?p=taming-data-and-transformers-for-audio-1)[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/taming-data-and-transformers-for-audio-1/audio-generation-on-audiocaps)](https://paperswithcode.com/sota/audio-generation-on-audiocaps?p=taming-data-and-transformers-for-audio-1)

# GenAU inference, training and evaluation
- [Introduction](#introduction)
Expand Down
14 changes: 14 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Copyright (c) 2024 Snap Inc. All rights reserved.

These sample code, data and model checkpoints are made available by Snap Inc. for non-commercial, research purposes only.

Non-commercial means not primarily intended for or directed towards commercial advantage or monetary compensation. Research purposes mean solely for study, instruction, or non-commercial research, testing or validation.

No commercial license, whether implied or otherwise, is granted in or to this code, unless you have entered into a separate agreement with Snap Inc. for such rights.

These sample code, data and model checkpoints are provided as-is, without warranty of any kind, express or implied, including any warranties of merchantability, title, fitness for a particular purpose, non-infringement, or that the code is free of defects, errors or viruses. In no event will Snap Inc. be liable for any damages or losses of any kind arising from this sample code or your use thereof.

Any redistribution of this sample code, including in binary form, must retain or reproduce the above copyright notice, conditions and disclaimer.

The following sets forth attribution notices for third-party software that may be included in portions of this sample code:

Binary file added assets/dataset.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 18 additions & 4 deletions dataset_preperation/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,26 @@

# AutoCap Dataset Preparation
# AutoReCap Dataset

# Introduction
We introduce an efficient pipeline for collecting ambient audio. It starts by analyzing automatic transcriptions of online videos to identify non-speech parts. Our captioning model, AutoCap, then generates captions and filters out segments with music or speech-related keywords. By using time-aligned transcriptions, we reduce the filtering rate and streamline the process by avoiding the need to download or process the audio files.
<br/>

<div align="center">
<img src="../assets/dataset.png" width="1200" />
</div>

<br/>


## Environment Initialization
For initializing your environment, please refer to the [general README](../README.md).

## Autocap Dataset Download
- We currently provide the following datasets:
* autocap_audioset_vggsounds: containing **444,837** audio-text pairs.
* autocap_audioset_vggsounds: containing roughly **445K** audio-text pairs, derived from VGGSounds and a subset of AudioSet. This dataset was not filtered to remove music and speech.
* AutoReCap-XL-Raw: containing **57M** audio-text pairs, derived from TODO
* AutoReCap-XL: containing **57M** audio-text pairs, derived from TODO
* AutoReCap-XL-Raw: containing **57M** audio-text pairs, derived from TODO

**More datasets will be coming soon!**

Expand Down Expand Up @@ -60,11 +74,11 @@ You need to arrange your audio files in one folder using the following structure
- Organizing your dataset following the instructions in [Dataset Organization](#dataset-organization).

## Download External Dataset
We provide a script for downloading audiocaps, wavcaps, and clotho datasets. Run the following scripts to download and organize each of these datasets:
We provide a script for downloading wavcaps datasets. Run the following scripts to download and organize each of these datasets:

```shell
python download_external_datasets --save_root <path-to-save-root> \
--dataset_nanmes "dataset_key_1" "dataset_key_2" ...
--dataset_names "dataset_key_1" "dataset_key_2" ...

# Organize each downloaded dataset
python organize_dataset.py --save_dir <path-to-downloaded-dataset> \
Expand Down
1 change: 1 addition & 0 deletions dataset_preperation/download.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ def read_video_segments_info(local_input_video_segments,
for idx, json_str in enumerate(tqdm(f, desc="parsing json input")):
if idx > start_idx:
try:
json_str = json_str.strip()
if json_str.endswith('\n'):
json_str = json_str[:-1]
if json_str.endswith(','):
Expand Down

0 comments on commit cf191e3

Please sign in to comment.