update readme

snap-research · Oct 24, 2024 · cf191e3 · cf191e3
1 parent 0268089
commit cf191e3
Show file tree

Hide file tree

Showing 6 changed files with 38 additions and 6 deletions.
diff --git a/AutoCap/README.md b/AutoCap/README.md
@@ -1,4 +1,6 @@
-[![arXiv](ARXIV ICON)](ARXIV LINK)
+
+[![Project Page](https://img.shields.io/badge/Project-Page-green.svg)](https://snap-research.github.io/GenAU) [![Arxiv](https://img.shields.io/badge/arxiv-2406.19388-b31b1b)](https://arxiv.org/abs/2406.19388) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/taming-data-and-transformers-for-audio-1/audio-captioning-on-audiocaps)](https://paperswithcode.com/sota/audio-captioning-on-audiocaps?p=taming-data-and-transformers-for-audio-1)[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/taming-data-and-transformers-for-audio-1/audio-generation-on-audiocaps)](https://paperswithcode.com/sota/audio-generation-on-audiocaps?p=taming-data-and-transformers-for-audio-1)
+
 
 # AutoCap inference, training and evaluation
 - [Inference](#inference)

diff --git a/GenAU/README.md b/GenAU/README.md
@@ -1,4 +1,5 @@
-<!-- [![arXiv](ARXIV ICON)](ARXIV LINK) -->
+
+[![Project Page](https://img.shields.io/badge/Project-Page-green.svg)](https://snap-research.github.io/GenAU) [![Arxiv](https://img.shields.io/badge/arxiv-2406.19388-b31b1b)](https://arxiv.org/abs/2406.19388) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/taming-data-and-transformers-for-audio-1/audio-captioning-on-audiocaps)](https://paperswithcode.com/sota/audio-captioning-on-audiocaps?p=taming-data-and-transformers-for-audio-1)[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/taming-data-and-transformers-for-audio-1/audio-generation-on-audiocaps)](https://paperswithcode.com/sota/audio-generation-on-audiocaps?p=taming-data-and-transformers-for-audio-1)
 
 # GenAU inference, training and evaluation
 - [Introduction](#introduction)

diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,14 @@
+Copyright (c) 2024 Snap Inc. All rights reserved.
+
+These sample code, data and model checkpoints are made available by Snap Inc. for non-commercial, research purposes only. 
+
+Non-commercial means not primarily intended for or directed towards commercial advantage or monetary compensation. Research purposes mean solely for study, instruction, or non-commercial research, testing or validation. 
+
+No commercial license, whether implied or otherwise, is granted in or to this code, unless you have entered into a separate agreement with Snap Inc. for such rights. 
+
+These sample code, data and model checkpoints are provided as-is, without warranty of any kind, express or implied, including any warranties of merchantability, title, fitness for a particular purpose, non-infringement, or that the code is free of defects, errors or viruses. In no event will Snap Inc. be liable for any damages or losses of any kind arising from this sample code or your use thereof.
+
+Any redistribution of this sample code, including in binary form, must retain or reproduce the above copyright notice, conditions and disclaimer. 
+
+The following sets forth attribution notices for third-party software that may be included in portions of this sample code:
+
diff --git a/assets/dataset.png b/assets/dataset.png
diff --git a/dataset_preperation/README.md b/dataset_preperation/README.md
@@ -1,12 +1,26 @@
 
-# AutoCap Dataset Preparation
+# AutoReCap Dataset
+
+# Introduction 
+We introduce an efficient pipeline for collecting ambient audio. It starts by analyzing automatic transcriptions of online videos to identify non-speech parts. Our captioning model, AutoCap, then generates captions and filters out segments with music or speech-related keywords. By using time-aligned transcriptions, we reduce the filtering rate and streamline the process by avoiding the need to download or process the audio files.
+<br/>
+
+<div align="center">
+<img src="../assets/dataset.png" width="1200" />
+</div>
+
+<br/>
+
 
 ## Environment Initialization
 For initializing your environment, please refer to the [general README](../README.md).
 
 ## Autocap Dataset Download
 - We currently provide the following datasets:
-    * autocap_audioset_vggsounds: containing **444,837** audio-text pairs.
+    * autocap_audioset_vggsounds: containing roughly **445K** audio-text pairs, derived from VGGSounds and a subset of AudioSet. This dataset was not filtered to remove music and speech.
+    * AutoReCap-XL-Raw: containing **57M** audio-text pairs, derived from TODO
+    * AutoReCap-XL: containing **57M** audio-text pairs, derived from TODO
+    * AutoReCap-XL-Raw: containing **57M** audio-text pairs, derived from TODO
 
 **More datasets will be coming soon!**
 
@@ -60,11 +74,11 @@ You need to arrange your audio files in one folder using the following structure
 - Organizing your dataset following the instructions in [Dataset Organization](#dataset-organization).
 
 ## Download External Dataset
-We provide a script for downloading audiocaps, wavcaps, and clotho datasets. Run the following scripts to download and organize each of these datasets:
+We provide a script for downloading wavcaps datasets. Run the following scripts to download and organize each of these datasets:
 
 ```shell
 python download_external_datasets --save_root <path-to-save-root> \
- --dataset_nanmes "dataset_key_1" "dataset_key_2" ...
+ --dataset_names "dataset_key_1" "dataset_key_2" ...
 
 # Organize each downloaded dataset
 python organize_dataset.py --save_dir <path-to-downloaded-dataset> \

diff --git a/dataset_preperation/download.py b/dataset_preperation/download.py
@@ -136,6 +136,7 @@ def read_video_segments_info(local_input_video_segments,
         for idx, json_str in enumerate(tqdm(f, desc="parsing json input")): 
             if idx > start_idx:
                 try:
+                    json_str = json_str.strip()
                     if json_str.endswith('\n'):
                         json_str = json_str[:-1]
                     if json_str.endswith(','):