-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New to this dataset. Need further guidance on the paper and repository especially on the dataset part #27
Comments
for question 4: coconut_val should be ready for you to explore, tutorials are here. A full tutorial for preparing dataset is coming, thanks for your patience. |
Okay. Let me know if there are updates regarding my questions |
try the conversion from masks using some python tools, it looks not accurate enough, I should release the original annotated bounding box data in these two days, |
Background: first I found Sama-COCO in FiftyOne dataset documentation, then I searched Sama-COCO in Google Scholar. There, I found your paper about COCONut dataset. As I see in the paper, the COCONut dataset has object detection features, which I want to use.
I read further on the paper and repository. As far as I read, the COCONut train dataset has S, B, and L variants. S variant is basically COCO2017 dataset with your annotation methods, B variant adds to S variant with annotations for unlabeled COCO2017 images, then L variant adds combines COCO2017 images with Objects365 dataset.
For the validation, there are relabeled COCO2017 validation version and COCONut version. The COCONut version especially uses Objects365 dataset.
That's what I get from the paper. Now for the data itself, I've seen the issues are apparent from Objects365 dataset (#26 ). As I see, this dataset relies on Flickr so I assume eventually some images there may be deleted from the website.
So, with that, I explore and download the whole dataset annotation files via Kaggle. However, on some cases, there are images with 0 kBs, which are mostly from L version (haven't checked for S or B version yet). Then, I use the Huggingface website especially from your profile page. It has instructions to install, especially for S and B variants, so at least I'm guided from those versions. However, the code hasn't supported the L version (yet?). I tried to modify the
download_coconut.py
to make--split
allow for L version, but somehow it failed especially for the dict and json parts.With these issues, sometimes I am a bit confused because each source from here, Kaggle, and Huggingface. Sorry if this post is kind of jumbled, so I need further guidance. Therefore, I have somewhat broad questions:
I think that's for now. I may question later. Anyway, nice work on the dataset. I'm intrigued to see the further development
The text was updated successfully, but these errors were encountered: