RandomGeoSampler for Pre-chipped Datasets #1976

yichiac · 2024-04-02T14:08:52Z

This PR replaces RandomBatchGeoSampler with RandomGeoSampler for Agrifieldnet, Sentinel2_CDL, Sentinel2_NCCM, Sentinel2_EuroCrops, and Sentinel2_SouthAmericaSoybean datamodules. RandomGeoSampler works better for the pre-chipped datasets because it returns the correct random bounding boxes.

adamjstewart · 2024-04-05T14:21:30Z

We should do this for all the datamodules you're using, not just these two.

…pler

adamjstewart

LGTM, is there anything else you wanted to add to this PR or can I merge?

adamjstewart · 2024-04-15T10:03:10Z

Tests fail because the fake data consists of a single file and 10% of 1 is 0.

yichiac · 2024-04-15T15:09:57Z

That's all I want to add to this PR. Thanks for reviewing the code, @adamjstewart!

tests/data/sentinel2/data.py

adamjstewart · 2024-05-01T14:00:25Z

This PR adds 1K new files to git. Do we really need that many files?

yichiac · 2024-05-01T15:04:55Z

If we change the split portion from [0.8, 0.1, 0.1] to something like [0.6, 0.2, 0.2], we can reduce 50% of test images.
random_bbox_assignment(self.dataset, [0.8, 0.1, 0.1], generator=generator)
For the current split portion, we need at least 10 images to get at least 1 val/test image. For [0.6, 0.2, 0.2], we just need 5 images. The split portion requiring the minimum number of images would be [0.3, 0.3, 0.3], which only needs 3 images.

That being said, are 500 images too many to be added to git? We can manually change the split portion later using [0.8, 0.1, 0.1].

adamjstewart · 2024-05-02T11:01:05Z

If we change the split portion from [0.8, 0.1, 0.1] to something like [0.6, 0.2, 0.2], we can reduce 50% of test images.

We shouldn't change the default split just to make testing easier. If you want to make the split ratio an input parameter (like we do for many other data modules), then we could use [0.33, 0.33, 0.33] only in CI and use [0.8, 0.1, 0.1] by default.

we need at least 10 images

I'm fine with at least 10 images, I'm less fine with at least 1000 images.

adamjstewart · 2024-08-06T11:42:44Z

@yichiac is this still needed?

yichiac · 2024-08-08T02:35:26Z

I think it is still better to include this change. If we can reduce the number of test files, it would be great to have RandomGeoSampler for pre-chipped datasets.

adamjstewart · 2024-08-08T09:20:05Z

I don't think the sampler is important here, the splitter is what's important.

I just feel like most users will either use raw Sentinel-2 tiles or write their own NonGeoDataset for the pre-chipped data. GeoDatasets don't make as much sense for pre-chipped data, they're really for raw data.

adamjstewart · 2024-08-21T09:51:33Z

@yichiac We should make a decision on this PR before the TorchGeo 0.6.0 release next week. Basically, it boils down to one question. Are these data modules designed for working with:

arbitrary Sentinel-2 tiles and raw mask products, or
the pre-chipped harmonized version of the dataset you created?

If 1, this PR should be closed, as the data modules already work as intended. If 2, we need to document this. We could also consider automatically downloading the harmonized version of the dataset in datamodule.prepare_date and any other hacks you needed for your paper reproducibility.

yichiac · 2024-08-21T18:36:01Z

I lean to option 1. This PR could be closed.

yichiac added 2 commits April 2, 2024 08:56

replace with randomgeosampler

c09e7ad

fix RandomGeoSampler args

9c09c68

github-actions bot added the datamodules PyTorch Lightning datamodules label Apr 2, 2024

yichiac added 2 commits April 2, 2024 09:32

remove grid_size

0096a06

fix style

ff3efde

adamjstewart modified the milestone: 0.6.0 Apr 2, 2024

yichiac and others added 4 commits April 6, 2024 17:08

Merge branch 'microsoft:main' into fixes_prechipped

66c011f

Merge branch 'microsoft:main' into fixes_prechipped

9487da1

apply changes to nccm, eurocrops, sas

44d2df8

resolve assignment: change self.train_batch_sampler to self.train_sam…

c665908

…pler

adamjstewart previously approved these changes Apr 15, 2024

View reviewed changes

yichiac and others added 4 commits April 18, 2024 14:35

Merge branch 'main' into fixes_prechipped

1aba944

Merge branch 'main' into fixes_prechipped

5891b59

Merge branch 'main' into fixes_prechipped

9f9c9a2

new s2 images

b1c4597

yichiac dismissed adamjstewart’s stale review via b1c4597 April 21, 2024 21:38

github-actions bot added the testing Continuous integration testing label Apr 21, 2024

yichiac and others added 3 commits April 21, 2024 16:56

recover the previous test S2 images

a46b5f4

remote previous S2 images

2c9e9f6

Merge branch 'main' into fixes_prechipped

5a16bd4

yichiac requested a review from adamjstewart April 22, 2024 15:40

adamjstewart reviewed Apr 25, 2024

View reviewed changes

tests/data/sentinel2/data.py Show resolved Hide resolved

yichiac and others added 3 commits April 28, 2024 13:36

Merge branch 'main' into fixes_prechipped

bc2f0b8

add Copernicus test data

ecd0715

fix style

c2dee54

yichiac requested a review from adamjstewart April 29, 2024 19:01

yichiac closed this Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RandomGeoSampler for Pre-chipped Datasets #1976

RandomGeoSampler for Pre-chipped Datasets #1976

yichiac commented Apr 2, 2024 •

edited

Loading

adamjstewart commented Apr 5, 2024

adamjstewart left a comment

adamjstewart commented Apr 15, 2024

yichiac commented Apr 15, 2024

adamjstewart commented May 1, 2024

yichiac commented May 1, 2024

adamjstewart commented May 2, 2024

adamjstewart commented Aug 6, 2024

yichiac commented Aug 8, 2024

adamjstewart commented Aug 8, 2024

adamjstewart commented Aug 21, 2024

yichiac commented Aug 21, 2024

RandomGeoSampler for Pre-chipped Datasets #1976

RandomGeoSampler for Pre-chipped Datasets #1976

Conversation

yichiac commented Apr 2, 2024 • edited Loading

adamjstewart commented Apr 5, 2024

adamjstewart left a comment

Choose a reason for hiding this comment

adamjstewart commented Apr 15, 2024

yichiac commented Apr 15, 2024

adamjstewart commented May 1, 2024

yichiac commented May 1, 2024

adamjstewart commented May 2, 2024

adamjstewart commented Aug 6, 2024

yichiac commented Aug 8, 2024

adamjstewart commented Aug 8, 2024

adamjstewart commented Aug 21, 2024

yichiac commented Aug 21, 2024

yichiac commented Apr 2, 2024 •

edited

Loading