-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CaFFe Dataset #2350
base: main
Are you sure you want to change the base?
Add CaFFe Dataset #2350
Conversation
Hi @Nora-Go, we would like to add your dataset to torchgeo but I have a couple of questions:
Thanks in advance! |
|
||
self.size = size | ||
|
||
def setup(self, stage: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks the same as the base class, could prob be removed
|
||
mask_dirs = ('fronts', 'zones') | ||
|
||
url = 'https://huggingface.co/datasets/torchgeo/glacier_calving_front/resolve/main/glacier_calving_data.zip' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace main
with the git commit hash for stability/reproducibility.
0: 'background', | ||
64: 'ocean', | ||
127: 'rock', | ||
254: 'glacier', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either in the dataset or the datamodule, we need to map these to ordinal numbers, correct? I've been meaning to add a transform for this since it comes up so often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I marked as a draft, because I am not 100% sure here, that's why I asked the question to the author. But yes you are correct, missing code for mapping it to ordinal and will do that once I get the answer :)
Hi @nilsleh,
This information is not publicly available so far. If you want I can send you the geo tiffs - those have the information and you can extract them to include them here.
It is 0: NA (no information available - e.g. you don't see anything in the image, but in reality it would be one of the following), 64: Rock outcrop, 127: Glacier, and 254: Ocean/Ice Melange I have some questions myself: |
@Nora-Go thank you for the reply. Ah too bad that the metadata is not available right away. I won't have the time to make a deep dive into the dataset, I was just looking for a interesting task with good labels for an evaluation. However, if you upload the geo tiffs somewhere (could also be the Hugginface repo) someone can add it at a later time, this time as a If you are interested in adding any of these parts yourself, and should you have questions about that, feel free to reach out. |
@nilsleh ah I see. Ok. Do you need the meta data already for using it as a benchmark? If yes, I can try to extract what you need. Otherwise I'll see how I'll deal with the geo tiffs for a future GeoDataset. For using the dataset as a benchmark, do you want to compare against my baseline (then the 256 x 256 is good) or do you want to compare against the state-of-the-art? The state-of-the-art (https://ieeexplore.ieee.org/abstract/document/10440599) uses a mixture between 256x256 and 512x512. Let me know if you have any further questions or need help handling the dataset :) |
Having latitude, longitude, and time available as metadata would already be really helpful for further downstream evaluation, so that would be great already in the patched dataset version, for example as a csv or json file with png filenames mapping to that information (but other formats work as well of course). I think it's fine to go with 512x512 for now as the evaluation would be more an internal comparison of models, and if you are using the 512x512 version in your research now as well, this might be more up to date. Also in the linked paper, I see that you name the dataset |
Yes, I guess 512x512 would be better :) and yes, it would be great if you could use the name CaFFe (which stands for "CAlving Fronts and where to Find thEm")! Thank you! I'll provide you with a csv - just give me a little time :) (I'm just waiting for a response from a collaborator) |
This PR adds the CaFFe (CAlving Fronts and where to Find thEm) dataset and accompanying
DataModule
for calving front and land scape zone segmentation.Implementation for the chipped dataset based on this script, which I uploaded to Huggingface.
Dataset features:
Dataset format:
TODOs:
Random Sample Plot: