Skip to content

Using XArray and dask in satpy

Martin Raspaud edited this page Mar 1, 2018 · 24 revisions

XArray

import xarray as xr

XArray's DataArray is now the standard data structure for arrays in satpy. They allow the array to have define dimensions, coordinates, and attributes (that we use for the metadata).

To create such an array, you can do for example

my_dataarray = xr.DataArray(my_data, dims=['y', 'x'],
                            coords={'x': np.arange(...)},
                            attrs={'sensor': 'olci'})

my_data can be a regular numpy array, a numpy memmap, or, if you want to keep things lazy, a dask array (more on dask later).

In satpy, the dimension of the arrays should include

  • x for the x or pixel dimension
  • y for the y or line dimension
  • bands for composites
  • time can also be provided, but we have limited support for it at the moment. Use metadata for common cases (start_time, end_time)

Dimensions are accessible through my_dataarray.dims. To get the size of a given dimension, use sizes:

my_dataarray.sizes['x']

Coordinates can be defined for those dimensions when it makes sense:

  • x and y: they are usually defined when the data's area is an AreaDefinition, and the contain the projection coordinates in x and y.
  • bands: they contain the letter of the color they represent, eg ['R', 'G', 'B'] for an RGB composite.

This allows then to select for example a single band like this:

red = my_composite.sel(bands='R')

or even multiple bands:

red_and_blue = my_composite.sel(bands=['R', 'B'])

Dask

import dask.array as da

Helpful functions:

  • map_blocks
  • map_overlap
  • atop
  • store
  • tokenize