You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! thanks for the dataset. It would be useful for me if you provided a 'small' subset like FMA (they do 8,000 tracks of 30s, 8 balanced genres (GTZAN-like) (7.2 GiB)). I know I could make a subset myself with the script cited on the readme, but I would need to download 100x the amount of data I want and then process it. If you think it's worth it, and are willing to host it, I can also make the subset myself and upload it somewhere. Thanks!
The text was updated successfully, but these errors were encountered:
Note that we have included lower-bitrate mono audio downloads that significantly reduce the download size (full dataset: 508 GB to 156 GB). I assume this is not small enough for a "small" dataset...
We lack a specific proposal for what the small subset should include. Should it cover all tags in MTG-Jamendo or a subset of tags?
Another alternative is to create a version of the full dataset with audio fragments instead of full tracks. Using 2 min or 30 second fragments for each track reduces the total dataset size from ~3778 hours to 1856.7 or 464 hours, respectively. The low-bitrate mono audio 30-second fragment version would take ~19 GB which is very reasonable.
Related to this, @philtgun has previously done a subset of MTG-Jamendo with one random track per artist (5 random trials) and one random track per album to see the statistics (autotagging_toy_0..4 and autotagging_toy_album_0). Leaving this here for reference.
Hi! thanks for the dataset. It would be useful for me if you provided a 'small' subset like FMA (they do 8,000 tracks of 30s, 8 balanced genres (GTZAN-like) (7.2 GiB)). I know I could make a subset myself with the script cited on the readme, but I would need to download 100x the amount of data I want and then process it. If you think it's worth it, and are willing to host it, I can also make the subset myself and upload it somewhere. Thanks!
The text was updated successfully, but these errors were encountered: