-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NCCM: Add new data module #1902
Conversation
Initial commit for adding Northeastern China Crop Map dataset
Co-authored-by: Yi-Chia Chang <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
Now that #1878 is merged, the data module correctly detects that NCCM and Sentinel-2 don't actually overlap (specifically in time). You have two options to get the tests to pass:
All of these options are equally valid I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code is fine, just need to get the tests to pass now.
|
||
self.dataset = self.sentinel2 & self.nccm | ||
(self.train_dataset, self.val_dataset, self.test_dataset) = ( | ||
random_grid_cell_assignment(self.dataset, [0.8, 0.1, 0.1], 2, generator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 seems great for testing, but I'm not sure if it seems great for actual usage... Are you sure this is what you want? For example, with CDL, this would mean we're splitting the US into NE, SE, SW, NW quadrants and randomly assigning them to train/val/test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grid_size
> 2 seems more reasonable for splitting in the actual case. I think we have three options :
- Set a parameter for
grid_size
so that users can define it themselves. - Create larger test data patches so that our pre-defined
grid_size
, saygrid_size
=8, can be executed. - Do both 1 and 2.
@shreyakannan1205 The Sentinel-2 test data is 2022, and the NCCM data only has 2017-2019. This is the reason why the test error shows |
Hi @adamjstewart @yichiac , I added the 2022 fake data to NCCM. Locally, my test cases pass, but when I push it to git, it is throwing an error. I tried making a few changes, but I am not too sure on what the issue is. It would be really helpful if you could look into it and let me know on how to fix it. Thanks! |
The error message you're seeing means that the datasets have no area of intersection. Why does it pass locally but not in CI? Because #1878 has been merged into main but not into your branch. Before running tests in CI, git will merge your PR into the main branch to make sure that your changes don't break main. Easiest way to test locally would be to rebase your branch or merge in upstream main. You'll note that this PR contains 111 commits. This is because it's also including all of your commits from your NCCM PR which were done on the main branch. I would highly recommend overwriting your main branch with the upstream main branch to avoid this. |
The purpose of this PR is to add NCCM's data module