Skip to content

Commit

Permalink
Merge pull request #242 from ecmwf/develop
Browse files Browse the repository at this point in the history
update docs
  • Loading branch information
mathleur authored Nov 6, 2024
2 parents 9d68476 + 09f6ede commit cf1433a
Show file tree
Hide file tree
Showing 23 changed files with 862 additions and 427 deletions.
21 changes: 10 additions & 11 deletions docs/Algorithm/Overview/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,30 +10,29 @@ Developed by ECMWF - the European Centre for Medium-Range Weather Forecasts - it

### Traditional Extraction Techniques

Traditional data extraction techniques only allow users to access datacubes "orthogonally" by selecting specific values or ranges along datacube dimensions.
Such data access mechanisms can be seen as extracting so-called "bounding boxes" of data.
These mechanisms are quite limited however as many user requests cannot be formulated using bounding boxes.
Traditional data extraction techniques only allow users to access boxes of data from datacubes.
These techniques are quite restrictive however as many user requests cannot be formulated using such boxes.

!!!note "Example"

Imagine for example someone interested in extracting temperature data over the shape of France.
France is not a box shape over latitude and longitude.
Using current extraction techniques, this exact request would therefore be impossible and users would instead need to request a bounding box around France.
Imagine for example someone interested in extracting wind data over the Mediterranean sea.
The Mediterranean is not a box shape over latitude and longitude.
Using current extraction techniques, this exact request would therefore be impossible and users would instead need to request a bounding box around the Mediterranean.
The user would thus get back much more data than he truly needs.

In higher dimensions, this becomes an even bigger challenge with only tiny fractions of the extracted data being useful to users.

### Polytope Extraction Technique

As an alternative, Polytope enables users to access datacubes "non-orthogonally".
Instead of extracting bounding boxes of data, Polytope has the capability of querying high-dimensional "polytopes" along several axes of a datacube.
This is much less restrictive than the popular bounding box approach described before.
Instead, Polytope enables users to access high-dimensional "polytopes" from datacubes, rather than only boxes of data.
<!-- Instead of extracting bounding boxes of data, Polytope has the capability of querying high-dimensional "polytopes" along several axes of a datacube. -->
<!-- This is much less restrictive than the popular bounding box approach described before. -->

!!!note "Example"

Using Polytope, extracting the temperature over just the shape of France is now trivially possible by specifying the right polytope.
Using Polytope, extracting the temperature over just the shape of the Mediterranean is now trivially possible by specifying the right polytope.
This returns much less data than by using a bounding box approach.

These polytope-based requests do in fact allow Polytope to fulfill its two main aims.
Indeed, because polytope requests return only the exact data users need, they significantly reduce I/O usage as less data has to be transmitted.
Indeed, because polytope requests return only the data users need, they significantly reduce I/O usage as less data has to be transmitted.
Moreover, because only the data inside the requested polytope is returned, this method completely removes the challenge of post-processing on the user side, as wanted.
2 changes: 1 addition & 1 deletion docs/Algorithm/User_Guide/Example.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Example
Here is a step-by-step example of how to use the Polytope software.

1. In this example, we first specify the data which will be in our Xarray datacube. Note that the data here comes from the GRIB file called "winds.grib", which is 3-dimensional with dimensions: step, latitude and longitude.
1. In this example, we first specify the data which will be in our XArray datacube. Note that the data here comes from the GRIB file called "winds.grib", which is 3-dimensional with dimensions: step, latitude and longitude.

import xarray as xr

Expand Down
4 changes: 2 additions & 2 deletions docs/Algorithm/User_Guide/Getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,13 @@ or from PyPI with the command

Polytope's tests and examples require some additional dependencies compared to the main Polytope software.

- **Git Large File Storage**
<!-- - **Git Large File Storage**
Polytope uses Git Large File Storage (LFS) to store large data files used in its tests and examples.
To run the tests and examples, it is thus necessary to install Git LFS, by following instructions provided [here](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage) for example.
Once Git LFS is installed, individual data files can be downloaded using the command
git lfs pull --include="*" --exclude=""
git lfs pull --include="*" --exclude="" -->

- **Additional Dependencies**

Expand Down
38 changes: 19 additions & 19 deletions docs/Service/Design_doc.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

### Feature Keyword

Feature extraction expands existing mars requests to include a `feature` keyword that includes a json dictionary taht describes the given feature. This feature is then extracted using the Polytope feature extraction algoithm and only points within the given feature are returned.
Feature extraction expands existing mars requests to include a `feature` keyword that includes a json dictionary that describes the given feature. This feature is then extracted using the Polytope feature extraction algorithm and only points within the given feature are returned.

```python
"feature" : {
Expand All @@ -15,7 +15,7 @@ Feature extraction expands existing mars requests to include a `feature` keyword

#### Type

An example of a minimal feature of `type` : `timeseries` can be seen above. A feature dictionary must always contain a `type`. The `type` in this case refers to what feature is being requested, the `type` of feaature requested will then determine the format of the output returned, what other keys can go in the feature and suitable defaults if they are not available. In some cases it may also affect keys outside of the feature dictionary that come from the traditional mars request. For example if `type` : `verticalprofile` and `levtype` : `sfc`, this request wont be sent as a vertical profile expects either `levtype` : `pl/ml`. Other exceptions will be given for each seperate feature `type`.
An example of a minimal feature of `type` : `timeseries` can be seen above. A feature dictionary must always contain a `type`. The `type` in this case refers to what feature is being requested, the `type` of feature requested will then determine the format of the output returned, what other keys can go in the feature and suitable defaults if they are not available. In some cases it may also affect keys outside of the feature dictionary that come from the traditional mars request. For example if `type` : `verticalprofile` and `levtype` : `sfc`, this request will not be sent as a vertical profile expects either `levtype` : `pl/ml`. Other exceptions will be given for each separate feature `type`.

The value available for `type` currently are as follows:

Expand All @@ -34,7 +34,7 @@ A feature dictionary must also contain the requested geometry in some form. For

#### Axis

A non mandatory field that is available for each feature that isnt present in the above example is `axis`. `axis` determines what field that the data should be enumerated along. In the case of a `timeseries` this will default to `step` meaning the timeseries will be along the `step` axis, however there are other available `axis` such as `datetime`, this would be for climate data which contains no `step` `axis`.
A non-mandatory field that is available for each feature that is not present in the above example is `axis`. `axis` determines what field that the data should be enumerated along. In the case of a `timeseries` this will default to `step` meaning that the timeseries will be along the `step` axis, however there are other available `axes` such as `datetime`, this would be for climate data which contains no `step` `axis`.

#### Range

Expand All @@ -48,32 +48,32 @@ A non mandatory field that is available for each feature that isnt present in th
}
```

If this range was included in the above feature dictionary for a `timeseries` it would ask for `step` (due to it being the default axis for timeseries) starting at `0` and ending at `10` with an interval of `2`, the returned steps would be `0,2,4,6,8,10`. Or equivilent to asking for the following in a mars request.
If this range was included in the above feature dictionary for a `timeseries` it would ask for `step` (due to it being the default axis for timeseries) starting at `0` and ending at `10` with an interval of `2`, the returned steps would be `0,2,4,6,8,10`. This is equivalent to asking for the following in a mars request:

```python
"step" : "0/to/10/by/2"
```

The above can also be put in the not feature key however it must then be mutually exclusive with `range`. If both or neither are in the request an error is thrown.
The above can also be put in the body of the request. However it must then be mutually exclusive with `range`. If both or neither are in the request an error is thrown.

`range` can also appear in the following form:

```python
"range" : [0,1,4,7,10]
```

This will only return the asked steps similar to in a mars request where a user asks for the following:
This will only return the asked steps similar to in a MARS request where a user asks for the following:

```python
"step" : "0/1/4/7/10"
```

Again either a `range` within the feature or an explicit `step` within the main body of the request can be used but not both or neither as there is no suitable default value unlike mars.
Again either a `range` within the feature or an explicit `step` within the main body of the request can be used but not both or neither as there is no suitable default value unlike MARS.


### MARS Fields

The non `feature` elements of the polytope-mars request act similar to the way one would expect when creating a mars request with a few differences.
The non `feature` elements of the polytope-mars request act similar to the way one would expect when creating a MARS request with a few differences.

* Most fields do not have a default value that will be tried if the field is not in the request.
* If a user makes a request and data is only available for some of the fields requested an error will be returned. Users will either receive all the data they requested or none.
Expand Down Expand Up @@ -152,19 +152,19 @@ request = {
}
```

The above would throw an error that `step` has been over subscribed.
The above would throw an error that `step` has been over-subscribed.

Ideally an valid mars request should be able to accept a valid `feature` and the polytope-mars request be valid but this may not always be true.

Users can include the `format` key however initally the only value available will be `covjson` or `application/json+covjson`, these will be the default values if `format` is not included. Further formats may be added in the future.
Users can include the `format` key. However, initially the only value available will be `covjson` or `application/json+covjson`, these will be the default values if `format` is not included. Further formats may be added in the future.

### Features

The following features will be available for use in polytope-mars.

#### Timeseries

A timeseries request has a `feature` with `type` : `timeseries` and a geomtry in the form of `points` containing a single point with latitude and longitude values. It also requires atleast one time dimension with the default being `step` however `datetime` is also accepted. The following is an example of a timeseries request:
A timeseries request has a `feature` with `type` : `timeseries` and a geometry in the form of `points` containing a single point with latitude and longitude values. It also requires at least one time dimension with the default being `step`, although `datetime` is also accepted. The following is an example of a timeseries request:

```python
request = {
Expand Down Expand Up @@ -213,7 +213,7 @@ request = {
}
```

In this case the user is requesting `step` `0-360` on `20241006` for the point `[-9.10, 38.78]`. As the user doesnt specify `interval` all steps between `0-360` that are available. If the datacube is a climate dataset that does not contain step, an error would be thrown as `step` is not in the datacube. In this case the user would have to provide a request like the following:
In this case the user is requesting `step` `0-360` on `20241006` for the point `[-9.10, 38.78]`. As the user does not specify `interval` all steps between `0-360` that are available. If the datacube is a climate dataset that does not contain step, an error would be thrown as `step` is not in the datacube. In this case the user would have to provide a request like the following:

```python
request = {
Expand Down Expand Up @@ -252,7 +252,7 @@ CoverageJSON output type: PointSeries

#### Vertical Profile

A vertical profile request has a `feature` with `type` : `verticalprofile` and a geomtry in the form of `points` containing a single point with latitude and longitude values. It also requires a `levtype` that is not `sfc` and a `levelist` in the request or as part of the `feature`. The following is an example of a vertical profile request:
A vertical profile request has a `feature` with `type` : `verticalprofile` and a geometry in the form of `points` containing a single point with latitude and longitude values. It also requires a `levtype` that is not `sfc` and a `levelist` in the request or as part of the `feature`. The following is an example of a vertical profile request:

```python
request = {
Expand Down Expand Up @@ -302,19 +302,19 @@ request = {
}
```

`levtype` can either be `ml` or `pl` but atleast one must be present.
`levtype` can either be `ml` or `pl` but at least one must be present.

`levelist` can either be in the main body of the request or in `range` as described in the `range` section. If no `interval` is provided all values in from `start` to `end` will be requested.

Currently the default for `axes` is `levelist` and is the only valid value for this key. This may change in the future. Users can include this in the request but it is not necessary.

In the above case if a range is provided for a field such as `number` a vertical profile as described above will be provided per `number` or any other range field.
In the above case if a range is provided for a field such as `number`, a vertical profile as described above will be provided per `number` or any other range field.

CoverageJSON output type: VerticalProfile

#### Trajectory

A trajectory request has a `feature` with `type` : `trajectory` and a geomtry in the form of `points` containing atleast two points with latitude and longitude, a level value, and a time value if no `axes` is provided. This is because the default `axes` are as follows:
A trajectory request has a `feature` with `type` : `trajectory` and a geometry in the form of `points` containing at least two points with latitude and longitude, a level value, and a time value if no `axes` is provided. This is because the default `axes` are as follows:

```python
"axes" : ["lat", "long", "level", "step"]
Expand Down Expand Up @@ -417,7 +417,7 @@ CoverageJSON output type: Trajectory

#### Polygon

A polygon request has a `feature` with `type` : `poylgon` and a geomtry in the form of `shape` containing atleast one list containing three points with latitude and longitude with the first and final point being the same to complete the polygon. The user can provide multiple lists of points forming polygons in the same request. An example of the `polygon` feature is seen below:
A polygon request has a `feature` with `type` : `poylgon` and a geometry in the form of `shape` containing at least one list containing three points with latitude and longitude with the first and final point being the same to complete the polygon. The user can provide multiple lists of points forming polygons in the same request. An example of the `polygon` feature is seen below:

```python
request = {
Expand Down Expand Up @@ -471,7 +471,7 @@ Returned coverages as polygons:

Each of these will be an individual coverage with the 3 requested parameters.

The `polygon` feature currently has limits on the size of a returned polygon and the maximum number of points allowed for a requsted polygon.
The `polygon` feature currently has limits on the size of a returned polygon and the maximum number of points allowed for a requested polygon.

CoverageJSON output type: MultiPoint

Expand All @@ -481,4 +481,4 @@ CoverageJSON output type: MultiPoint
CoverageJSON has a number of different output features. Depending on the feature selected the output type will vary.

A coverageCollection is always returned even if there is only a single coverage.
A new coverage is created for each ensemble number and depending on the feature type each new date (except in timeseries). The only grouped field is param which will be in the same coverage.
A new coverage is created for each ensemble number and depending on the feature type each new date (except in timeseries). The only grouped field is `param` which will be in the same coverage.
Loading

0 comments on commit cf1433a

Please sign in to comment.