-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding LRGB to the HuggingFace hub #10
Comments
PascalVOC-SP
|
Amazing! |
Yes thank you, that would be great. I also realized I uploaded My HF Username is : SauravMaheshkar |
I did and I added you to it! Once the datasets are correctly processed, feel free to transfer them to the org namespace! @rampasek @vijaydwivedi75 Would one of you want to be an admin of that? |
I pre-processed and added all the PascalVOC datasets to the organization. |
Thanks a lot @SauravMaheshkar @clefourrier! @clefourrier, sure. My username is vijaypradwi. |
@vijaydwivedi75 Added you as admin! Feel free to ask any questions you need here :) |
I pre-processed and added all the COCO-SP datasets to the organization. |
I pre-processed and added the peptides-functional dataset to the organization. |
I pre-processed and added the peptides-structural dataset to the organization. |
I pre-processed and add the PCQM-Contact dataset to the organization. That's all the datasets done ✅ . @clefourrier @vijaydwivedi75 can you folks go through the datasets and make sure they look good ? Maybe then we can close this issue. |
Thank you very much for your work! I think we're very close to being good, just 2 last points:
|
|
Regarding 1, I might be missing context, since I don't know LRGB that well: which preprocessing scripts did you use? (We do usually want the pre-processed datasets.) Regarding 2, it depends on 1, I'd need to understand better what the preprocessing does to give you a hand :) |
All the datasets have a For 2, I'll refer to @vijaydwivedi75 for more context. |
Ok, that's exactly what I needed, thank you! I'm a bit in a rush today but I'll take the time to look at this in more depth on Monday (CET). |
Hi @SauravMaheshkar ! from datasets import Dataset
import torch
torch_dataset_info, torch_dataset = torch.load(<local path to pt file>)
# A torch dataset is a tuple which describes the contents shape, then stores the contents - we want the actual contents
hf_dataset = Dataset.from_dict(torch_dataset)
# This command will require you to be connected, but will send the datasets automatically
hf_dataset.push_to_hub("LRGB/<dataset name>", split=<dataset split>) I very sorry I did not notice earlier that the files were saved as pytorch objects. |
Do you want to split, you convert half of them, and I convert the other half? |
Sure, Thanks a lot ! I can take up the VOC superpixels and maybe you can take up COCO superpixels |
Perfect! |
Ran into the following Error
|
Hi @SauravMaheshkar , could you provide the full stack trace of the error, tell me on which dataset this occur, and maybe print the |
Sadly that is the entire stack trace (apart from the progress bar)
|
Hi again! I pinged people working on datasets, and your error message allowed to identify a corner case when pushing to an already existing repo without dataset_info in the YAML tags, so thank you! 🤗 A fix is being merged, once it's in datasets, you'll just have to update the lib and try again and it should work seamlessly. |
Oh great, glad to help ig 😅 |
Coming back to this! I've converted the coco datasets with datasets 2.11.0, using: from datasets import Dataset
import torch
dataset_names = [your dataset names]
for dataset in dataset_names:
for split in ["train", "val", "test"]:
torch_dataset_info, torch_dataset = torch.load(
f"/{path_to_your_folder}/{dataset}/{split}.pt"
)
hf_dataset = Dataset.from_dict(torch_dataset)
hf_dataset.push_to_hub(f"LRGB/{dataset}", split=split) |
Hi @SauravMaheshkar, did you have the time to look at this? |
Hi!
@migalkin suggested on Twitter adding your datasets to the HuggingFace hub, which I think is a super cool idea, so I'm opening this issue to see if you need any help with that!
Here is the step by step tutorial on how to do so.
Ping me if you need anything in the process 🤗
The text was updated successfully, but these errors were encountered: