Create dataset of UK-cropped satellite data from Europe dataset #150

devsjc · 2023-02-01T13:47:31Z

Summary

Currently there exists a ~40Tb satellite image dataset on GCP (and on Leonardo). For ease of ML training, having a more managably-sized ~100Gb dataset that is purely UK image data would be beneficial. As such, we want to read in that existing dataset, crop the images down so they cover the UK alone, and write it to a new dataset.

Data structure

The dataset in GCP is stored in the bucket solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/v4.

The sattelite dataset consists of several years of data. This is a grid of chunks, each chunk containing 12 5-minute timesteps making up an hours' worth of imagery.

The bounds used to specify the UK in Satip are "UK": (-16, 45, 10, 62).

Method (Work in progress)

Pull and uncompress current data, x timesteps at a time
Copy/save metadata to avoid loss
Extract images from chunks

Known gotchas

XArray will often delete zarr attribute files when writing new data: ensure to copy them explicitly into the new dataset
Will require decoding via OCF's blosc2 py library

The text was updated successfully, but these errors were encountered:

jacobbieker · 2023-02-01T13:51:58Z

You might want to rechunk the dataset as well, primarily in the x and y dims to better match the spatial extant.

devsjc · 2023-02-01T13:58:48Z

I seem to recall that the images for this dataset were chunked using a 4x4 grid? If x and y are only split into 4 respectively on the large image dataset, and with these cropped images expected to be ~100x smaller, won't one entire cropped image be significanly less than what was previously in a x/y chunk, and hence we might not even need to chunk x/y?

Forgive me if/as my lack of understanding renders this question nonsensical...!

jacobbieker · 2023-02-01T13:59:48Z

Yeah, I agree! But you might have to explicitly rechunk the data to that size

zakwatts · 2024-01-14T09:04:37Z

@devsjc Is this complete now? I.e: code to do this merged?

peterdudfield · 2024-01-26T11:49:42Z

This could be linked to #180

devsjc self-assigned this Feb 1, 2023

devsjc mentioned this issue Feb 1, 2023

SLow loadin satellite data openclimatefix/ocf_datapipes#132

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create dataset of UK-cropped satellite data from Europe dataset #150

Create dataset of UK-cropped satellite data from Europe dataset #150

devsjc commented Feb 1, 2023 •

edited

Loading

jacobbieker commented Feb 1, 2023

devsjc commented Feb 1, 2023

jacobbieker commented Feb 1, 2023

zakwatts commented Jan 14, 2024

peterdudfield commented Jan 26, 2024

Create dataset of UK-cropped satellite data from Europe dataset #150

Create dataset of UK-cropped satellite data from Europe dataset #150

Comments

devsjc commented Feb 1, 2023 • edited Loading

Summary

Data structure

Method (Work in progress)

Known gotchas

jacobbieker commented Feb 1, 2023

devsjc commented Feb 1, 2023

jacobbieker commented Feb 1, 2023

zakwatts commented Jan 14, 2024

peterdudfield commented Jan 26, 2024

devsjc commented Feb 1, 2023 •

edited

Loading