LAION-Audio-630K Dataset

LAION-Audio-630K is a large-scale audio-text dataset consisting of 633,526 pairs with the total duration of 4,325.39 hours. It contains audios of human activities, natural sounds and audio effects, consisting of 8 data sources (see the data source table below) from publicly available websites. We collect these datasets by downloading audios and relevant text descriptions. Based on our current knowledge, LAION-Audio-630K is the largest audio-text dataset publicly available and a magnitude larger than previous audio-text datasets (by 2022-11-05).

Content

Among the 8 datasets, we only release 4 of them (BBC sound effects, Epidemic Sound, Audiostock and Freesound). The first 3 datasets are available under csv format , since they are public available by anyone through URL links provided by correspondent websites. As to Freesound, we released the whole dataset (audio files + text caption) to Hugging Face. However, as to the others, i.e. Free To Use Sounds, Sonniss Game Effects, We Sound Effects and Paramount Motion Sound Effects, we would not release them because they are pruchased by LAION.

CSV Format

CSV files are of the following structure:

_url	_caption1	_caption2	_...	_{caption_t5}	_{metadata1}	_{metadata2}	_...

url: The URL of the audio file
caption_i: the i-th caption of the audio file
caption_t5: For Epidemic Sound, we adopted keywords-to-caption data augmentation using T5 model. Details could be found in the datacard of Epidemic Sound.
{metadata_i}: Metadata could be the freesound id of the audio etc.

Datacards

We provide a datacard for each dataset we processed, which record how we process it. If you want to learn more about caption generation as well as details of keywords-to-caption data augmentation, please read datacards available here (for Epidemic Sound dataset).

About Freesound

We provide two version of Freesound dataset.

Freesound (full): The original Freesound dataset. Details could be found in its datacard.
Freesound (no overlap): Made based on Freesound(full), with samples from ESC50, FSD50K, Urbansound8K and Clotho removed.

We have released the processed freesound dataset in Webdataset format to a Hugging Face repository

Data Sources

Name	Duration	Number of Samples	Data Type	Source	Data Card
Freesound (no overlap)	2817.31hrs	460801	1-2 captions per audio, audio	website licenses file Hugging Face repository
Freesound (full)	3033.38hrs	515581	1-2 captions per audio, audio	website licenses file Hugging Face repository	data card
Epidemic Sound	220.41hrs	75645	2 captions per audio, audio	website csv (Including T5-generated de-biased captions)	data card
Audiostock	46.30hrs	10000	1 caption per audio, audio	website csv	data card
BBC Sound Effects	463.48hrs	15973	1 caption per audio, audio	website csv*(no longer available, click to see explication below)	data card
Free To Use Sounds	175.73hrs	6370	Filename as caption, audio	website(need purchasing)
Sonniss Game effects	84.6hrs	5049	Filename as caption, audio	website(need purchasing)
We Sound Effects	12.00hrs	488	Filename as caption, audio	website(need purchasing)
Paramount Motion Sound Effects	19.49hrs	4420	Filename as caption, audio	website(need purchasing)

*About BBC Sound Effects

Recently, BBC sound effects have modified their website structure. In consequence, only 300 samples are available for download. So, unfortunately, we are no longer able to generate csv file using our old scripts. In the meantime, many scrappers exist on GitHub, such as https://github.com/alisomay/bbc-sound-effects-downloader. You may try them to see if they work.

Keyword-to-Caption Augmentation

We employ the keyword-to-caption model to augment labels of AudioSet and Epidemic Sound into corresponding captions with aid of a pre-trained language model T5. We also de-bias these captions by replacing, for example, "woman" and "man" with "person", aiming to eliminate potential gender discrimination. We hereby release the augmented captions for Epidemic Sound and AudioSet (in csv format).

Epidemic Sound	AudioSet
Epidemic_all_debiased.csv	csv files for AudioSet balanced_train, unbalanced_train, and eval splits

Credits & Licence

!!!TERM OF USE!!!: By downloading audios through the links provided in the csv files, you agree that you will use the audios for research purposes only, unless you get the permission from owners of the Datasource that you can use it for other purposes.

Acknowledgement

The whole collection process as well as all usage of the LAION-Audio-630K are conducted by Germany non-profit pure research organization LAION. All contributors and collectors of the dataset are considered as open source contributors affiliated to LAION. These community contributors (Discord ids) include but not limited to: @marianna13#7139, @Chr0my#0173, @PiEquals4#1909, @Yuchen Hui#8574, @Antoniooooo#4758, @IYWO#9072, krishna#1648, @dicknascarsixtynine#3885, and @turian#1607. We would like to appreciate all of them for their efforts on the LAION-Audio-630k dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

LAION-Audio-630K Dataset

Content

CSV Format

Datacards

About Freesound

Data Sources

*About BBC Sound Effects

Keyword-to-Caption Augmentation

Credits & Licence

Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

LAION-Audio-630K Dataset

Content

CSV Format

Datacards

About Freesound

Data Sources

*About BBC Sound Effects

Keyword-to-Caption Augmentation

Credits & Licence

Acknowledgement