Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCCM: Add new data module #1902

Closed
wants to merge 111 commits into from
Closed

NCCM: Add new data module #1902

wants to merge 111 commits into from

Conversation

shreyakannan1205
Copy link
Contributor

@shreyakannan1205 shreyakannan1205 commented Feb 23, 2024

The purpose of this PR is to add NCCM's data module

shreyakannan1205 and others added 30 commits October 14, 2023 11:55
Initial commit for adding Northeastern China Crop Map dataset
Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
Co-authored-by: Adam J. Stewart <[email protected]>
@adamjstewart adamjstewart changed the title New PR for NCCM datamodule NCCM: Add new data module Feb 25, 2024
@adamjstewart adamjstewart added this to the 0.6.0 milestone Feb 25, 2024
torchgeo/datamodules/nccm.py Show resolved Hide resolved
torchgeo/datamodules/nccm.py Outdated Show resolved Hide resolved
torchgeo/samplers/batch.py Outdated Show resolved Hide resolved
torchgeo/trainers/segmentation.py Outdated Show resolved Hide resolved
@github-actions github-actions bot removed samplers Samplers for indexing datasets trainers PyTorch Lightning trainers labels Feb 25, 2024
@adamjstewart
Copy link
Collaborator

Now that #1878 is merged, the data module correctly detects that NCCM and Sentinel-2 don't actually overlap (specifically in time). You have two options to get the tests to pass:

  1. Change the timestamp of the NCCM or Sentinel-2 fake data
  2. Add more NCCM or Sentinel-2 fake data at different times

All of these options are equally valid I think.

adamjstewart
adamjstewart previously approved these changes Feb 25, 2024
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is fine, just need to get the tests to pass now.


self.dataset = self.sentinel2 & self.nccm
(self.train_dataset, self.val_dataset, self.test_dataset) = (
random_grid_cell_assignment(self.dataset, [0.8, 0.1, 0.1], 2, generator)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 seems great for testing, but I'm not sure if it seems great for actual usage... Are you sure this is what you want? For example, with CDL, this would mean we're splitting the US into NE, SE, SW, NW quadrants and randomly assigning them to train/val/test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grid_size > 2 seems more reasonable for splitting in the actual case. I think we have three options :

  1. Set a parameter for grid_size so that users can define it themselves.
  2. Create larger test data patches so that our pre-defined grid_size, say grid_size=8, can be executed.
  3. Do both 1 and 2.

@yichiac
Copy link
Contributor

yichiac commented Feb 29, 2024

Now that #1878 is merged, the data module correctly detects that NCCM and Sentinel-2 don't actually overlap (specifically in time). You have two options to get the tests to pass:

  1. Change the timestamp of the NCCM or Sentinel-2 fake data
  2. Add more NCCM or Sentinel-2 fake data at different times

All of these options are equally valid I think.

@shreyakannan1205 The Sentinel-2 test data is 2022, and the NCCM data only has 2017-2019. This is the reason why the test error shows no spatiotemporal intersection. I checked that those NCCM and Sentinel-2 test data are spatially intersected, so we only need to fix the temporal intersection. As @adamjstewart suggested, you can decide which option to get the tests passed.

@shreyakannan1205
Copy link
Contributor Author

Hi @adamjstewart @yichiac , I added the 2022 fake data to NCCM. Locally, my test cases pass, but when I push it to git, it is throwing an error. I tried making a few changes, but I am not too sure on what the issue is. It would be really helpful if you could look into it and let me know on how to fix it. Thanks!

@adamjstewart
Copy link
Collaborator

The error message you're seeing means that the datasets have no area of intersection.

Why does it pass locally but not in CI? Because #1878 has been merged into main but not into your branch. Before running tests in CI, git will merge your PR into the main branch to make sure that your changes don't break main.

Easiest way to test locally would be to rebase your branch or merge in upstream main. You'll note that this PR contains 111 commits. This is because it's also including all of your commits from your NCCM PR which were done on the main branch. I would highly recommend overwriting your main branch with the upstream main branch to avoid this.

@shreyakannan1205 shreyakannan1205 deleted the dm_branch branch March 15, 2024 02:28
@adamjstewart adamjstewart removed this from the 0.6.0 milestone Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datamodules PyTorch Lightning datamodules testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants