-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create dataset of UK-cropped satellite data from Europe dataset #150
Comments
You might want to rechunk the dataset as well, primarily in the x and y dims to better match the spatial extant. |
I seem to recall that the images for this dataset were chunked using a 4x4 grid? If x and y are only split into 4 respectively on the large image dataset, and with these cropped images expected to be ~100x smaller, won't one entire cropped image be significanly less than what was previously in a x/y chunk, and hence we might not even need to chunk x/y? Forgive me if/as my lack of understanding renders this question nonsensical...! |
Yeah, I agree! But you might have to explicitly rechunk the data to that size |
@devsjc Is this complete now? I.e: code to do this merged? |
This could be linked to #180 |
Summary
Currently there exists a ~40Tb satellite image dataset on GCP (and on Leonardo). For ease of ML training, having a more managably-sized ~100Gb dataset that is purely UK image data would be beneficial. As such, we want to read in that existing dataset, crop the images down so they cover the UK alone, and write it to a new dataset.
Data structure
The dataset in GCP is stored in the bucket
solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/v4
.The sattelite dataset consists of several years of data. This is a grid of chunks, each chunk containing 12 5-minute timesteps making up an hours' worth of imagery.
The bounds used to specify the UK in Satip are
"UK": (-16, 45, 10, 62)
.Method (Work in progress)
Known gotchas
The text was updated successfully, but these errors were encountered: