Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LAION-Human Dataset #29

Open
unrealMJ opened this issue Nov 11, 2023 · 4 comments
Open

LAION-Human Dataset #29

unrealMJ opened this issue Nov 11, 2023 · 4 comments

Comments

@unrealMJ
Copy link

Hi,

I have already downloaded the full laion-5b dataset. How can i use your .parquet and mapping file to get corresponding image.

@unrealMJ
Copy link
Author

Also, the .parquet has 2.86M images, while the mapping.json has 1M images, it seems that is a subset of .parquet. I'd like to ask for the details about .parquet, i think is a subset of laion-5b, how do you get it?

@juxuan27
Copy link
Contributor

Hi, @unrealMJ ! Thank you for your focus. You may use python utils/download_data.py to download all images. The .parquet has provides images in Laion-Aesthetic since we have a different order with the original Laion-Aesthetic dataset as mentioned in issue4.

@unrealMJ
Copy link
Author

Hi, thanks for your reply.
The Laion2b-en-aesthetic in huggingface has 52.1M rows, but the .parquet you provided only has 2.86M rows, i'd like to ask the difference.

@juxuan27
Copy link
Contributor

The .parquet we provide is a subset of Laion2b-en-aesthetic, filtering out the part with a higher aesthetic score.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants