New to this dataset. Need further guidance on the paper and repository especially on the dataset part #27

DWSuryo · 2024-09-08T13:57:19Z

Background: first I found Sama-COCO in FiftyOne dataset documentation, then I searched Sama-COCO in Google Scholar. There, I found your paper about COCONut dataset. As I see in the paper, the COCONut dataset has object detection features, which I want to use.

I read further on the paper and repository. As far as I read, the COCONut train dataset has S, B, and L variants. S variant is basically COCO2017 dataset with your annotation methods, B variant adds to S variant with annotations for unlabeled COCO2017 images, then L variant adds combines COCO2017 images with Objects365 dataset.

For the validation, there are relabeled COCO2017 validation version and COCONut version. The COCONut version especially uses Objects365 dataset.

That's what I get from the paper. Now for the data itself, I've seen the issues are apparent from Objects365 dataset (#26 ). As I see, this dataset relies on Flickr so I assume eventually some images there may be deleted from the website.

So, with that, I explore and download the whole dataset annotation files via Kaggle. However, on some cases, there are images with 0 kBs, which are mostly from L version (haven't checked for S or B version yet). Then, I use the Huggingface website especially from your profile page. It has instructions to install, especially for S and B variants, so at least I'm guided from those versions. However, the code hasn't supported the L version (yet?). I tried to modify the download_coconut.py to make --split allow for L version, but somehow it failed especially for the dict and json parts.

With these issues, sometimes I am a bit confused because each source from here, Kaggle, and Huggingface. Sorry if this post is kind of jumbled, so I need further guidance. Therefore, I have somewhat broad questions:

Where can I find the object detection version for this dataset instead of instance segmentation (referring to paper)?
Is the COCONut-L version supported to download via Huggingface?
Since there are some images with 0 kBs in Kaggle, where is the complete one, Kaggle or Huggingface?
You mentioned some updated annotations files are from drive folder (coconut_val size and composition #24 ), but somehow, I can't find it. which file/folder you are referring to?

I think that's for now. I may question later. Anyway, nice work on the dataset. I'm intrigued to see the further development

The text was updated successfully, but these errors were encountered:

xdeng7 · 2024-09-09T17:02:51Z

Background: first I found Sama-COCO in FiftyOne dataset documentation, then I searched Sama-COCO in Google Scholar. There, I found your paper about COCONut dataset. As I see in the paper, the COCONut dataset has object detection features, which I want to use.

I read further on the paper and repository. As far as I read, the COCONut train dataset has S, B, and L variants. S variant is basically COCO2017 dataset with your annotation methods, B variant adds to S variant with annotations for unlabeled COCO2017 images, then L variant adds combines COCO2017 images with Objects365 dataset.

For the validation, there are relabeled COCO2017 validation version and COCONut version. The COCONut version especially uses Objects365 dataset.

That's what I get from the paper. Now for the data itself, I've seen the issues are apparent from Objects365 dataset (#26 ). As I see, this dataset relies on Flickr so I assume eventually some images there may be deleted from the website.

So, with that, I explore and download the whole dataset annotation files via Kaggle. However, on some cases, there are images with 0 kBs, which are mostly from L version (haven't checked for S or B version yet). Then, I use the Huggingface website especially from your profile page. It has instructions to install, especially for S and B variants, so at least I'm guided from those versions. However, the code hasn't supported the L version (yet?). I tried to modify the download_coconut.py to make --split allow for L version, but somehow it failed especially for the dict and json parts.

With these issues, sometimes I am a bit confused because each source from here, Kaggle, and Huggingface. Sorry if this post is kind of jumbled, so I need further guidance. Therefore, I have somewhat broad questions:

Where can I find the object detection version for this dataset instead of instance segmentation (referring to paper)?

Is the COCONut-L version supported to download via Huggingface?

Since there are some images with 0 kBs in Kaggle, where is the complete one, Kaggle or Huggingface?

You mentioned some updated annotations files are from drive folder (coconut_val size and composition #24 ), but somehow, I can't find it. which file/folder you are referring to?

I think that's for now. I may question later. Anyway, nice work on the dataset. I'm intrigued to see the further development

for question 4: coconut_val should be ready for you to explore, tutorials are here. A full tutorial for preparing dataset is coming, thanks for your patience.

DWSuryo · 2024-09-11T02:42:57Z

Okay. Let me know if there are updates regarding my questions

xdeng7 · 2024-09-30T16:25:45Z

Okay. Let me know if there are updates regarding my questions

try the conversion from masks using some python tools, it looks not accurate enough, I should release the original annotated bounding box data in these two days,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New to this dataset. Need further guidance on the paper and repository especially on the dataset part #27

New to this dataset. Need further guidance on the paper and repository especially on the dataset part #27

DWSuryo commented Sep 8, 2024 •

edited

Loading

xdeng7 commented Sep 9, 2024

DWSuryo commented Sep 11, 2024

xdeng7 commented Sep 30, 2024

New to this dataset. Need further guidance on the paper and repository especially on the dataset part #27

New to this dataset. Need further guidance on the paper and repository especially on the dataset part #27

Comments

DWSuryo commented Sep 8, 2024 • edited Loading

xdeng7 commented Sep 9, 2024

DWSuryo commented Sep 11, 2024

xdeng7 commented Sep 30, 2024

DWSuryo commented Sep 8, 2024 •

edited

Loading