Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/updated protobuf w develop #134

Merged
merged 108 commits into from
Jul 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
108 commits
Select commit Hold shift + click to select a range
45b9a11
create tensor index tree object and incorporate into hullslicer
mathleur Feb 15, 2024
fe35193
fix some tests
mathleur Feb 16, 2024
4c868b2
fix get for xarray backend
mathleur Feb 16, 2024
6bd0b1c
fix xarray testts
mathleur Feb 16, 2024
2efbacf
fix mroe tests
mathleur Feb 16, 2024
07d48f3
fix type change transformation
mathleur Feb 16, 2024
3cbed71
fix all tests with xarray backend
mathleur Feb 16, 2024
7eb8755
make first version fo tensor index tree work with fdb
mathleur Feb 19, 2024
ba382fa
more tests
mathleur Feb 19, 2024
e415a2d
start adding compression for grids
mathleur Feb 19, 2024
ba65c47
clean up
mathleur Feb 19, 2024
d60bb0a
make right grid axes be compressed in tree
mathleur Feb 19, 2024
4110f93
clean up
mathleur Feb 29, 2024
ccd1212
Merge pull request #118 from ecmwf/develop
mathleur Mar 1, 2024
ed43774
remove duplicate print for index trees
mathleur Mar 4, 2024
cb94a1e
start moving compressed tree logic of adding children to tensor tree …
mathleur Mar 4, 2024
ef80bb5
merge develop and move compression logic into tensor index tree
mathleur Mar 5, 2024
bce6fb4
fix xarray get
mathleur Mar 6, 2024
9b5e71d
start to fix more tests
mathleur Mar 7, 2024
582f299
fix almost all problems except data size attached to leaves
mathleur Mar 7, 2024
fab2252
fix almost all tests, but data extracted from fdb is only the first i…
mathleur Mar 7, 2024
1ec5145
fix small bug on slicing for flat polytopes
mathleur Mar 11, 2024
3b16685
fix everything except returned values from pygribjump
mathleur Mar 11, 2024
8c9f273
flake8
mathleur Mar 11, 2024
c4ff333
flake8
mathleur Mar 11, 2024
171efae
flake8
mathleur Mar 11, 2024
9acfcf4
make associating data to right tree nodes work with compressed trees
mathleur Mar 14, 2024
670fb08
clean up
mathleur Mar 14, 2024
f064d9f
give data to right nodess
mathleur Mar 21, 2024
408a257
make almost all tests work for compressed box shapes
mathleur Mar 27, 2024
d5bcfc3
fix last test for box shape compression
mathleur Mar 27, 2024
4407b9f
make the point shape compressed
mathleur Mar 27, 2024
af8fbea
time compression on example
mathleur Mar 28, 2024
04e6cfe
fix performance test
mathleur Mar 28, 2024
fca907b
fix xarray hullslicer and add performance test
mathleur Mar 28, 2024
f94a0b1
do not compress merged axes
mathleur Apr 30, 2024
e48d86e
add new compressed test example
mathleur Apr 30, 2024
fe14f18
think about not slicing on compressed axes
mathleur Apr 30, 2024
c84e61e
try to fix the fact that select creates several branches of the tree …
mathleur May 2, 2024
434278f
make sure to compress all the 1D shapes
mathleur May 3, 2024
09a5458
refactor code and only slice non-compressed axes into new polytopes f…
mathleur May 6, 2024
b2421b7
better slicer where we do the compression logic inside the slicer ins…
mathleur May 7, 2024
9606d51
formatting
mathleur May 7, 2024
56ea721
Merge branch 'main' of github.com:ecmwf/polytope into develop
mathleur May 7, 2024
b0dc5ff
merge develop
mathleur May 7, 2024
0d617a8
fix problems
mathleur May 7, 2024
3a46769
fix compression in test
mathleur May 7, 2024
1d1c43f
black
mathleur May 7, 2024
6a28c43
flake8
mathleur May 7, 2024
ace3975
Merge pull request #130 from ecmwf/feature/different_tree_compression…
mathleur May 7, 2024
6bbd927
clean up
mathleur May 7, 2024
d64c0da
merge with origin
mathleur May 7, 2024
25fc925
change is_1D shape property to is_orthogonal
mathleur May 8, 2024
71ed90f
move compressed axis object to hullslicer and add cross-checking with…
mathleur May 8, 2024
09d99a7
reformatting
mathleur May 8, 2024
b67b8f1
add test
mathleur May 10, 2024
df615f3
black
mathleur May 10, 2024
a13d297
flake8
mathleur May 10, 2024
78567e7
compress also unsliceable axes
mathleur May 10, 2024
2b9caa1
add explanation for combinatorics function
mathleur May 10, 2024
2100a9b
clean up
mathleur May 10, 2024
f5876e3
unify Polytope API to give a GribJump object to create the fdb datacube
mathleur May 10, 2024
945629f
try to fix fdb tests
mathleur May 10, 2024
a325a3b
add pygribjump to dependencies
mathleur May 10, 2024
163b2c7
remove unused options and start to create unified options class
mathleur May 10, 2024
9e0b392
try to add options in one single conflator config
mathleur May 10, 2024
12faf88
add all options into a single option dictionary in polytope user inte…
mathleur May 13, 2024
10d6ecf
clean up
mathleur May 13, 2024
2df5679
black
mathleur May 13, 2024
012c746
add pygribjump to requirements as github repo
mathleur May 13, 2024
b218448
add pygribjump to requirements as github repo
mathleur May 13, 2024
4de7f38
add pygribjump to requirements as github repo
mathleur May 13, 2024
d989a56
remove gribjump dependency outside tests
mathleur May 13, 2024
8a7c628
Merge pull request #131 from ecmwf/feature/unify_options
mathleur May 13, 2024
bbd2577
try to ignore axes that do not exist in datacube
mathleur May 15, 2024
beed64c
try to fix tests
mathleur May 15, 2024
14b9118
try to fix tests
mathleur May 15, 2024
ea7bb54
try to fix tests with quantiles too
mathleur May 16, 2024
b2e8de5
ignore non-working tests for now
mathleur May 16, 2024
de8780e
clean up
mathleur May 16, 2024
50cfec2
fix test with unusual parameter
mathleur May 22, 2024
3804383
change protobuf to work with cpp gribjump encoding
mathleur May 29, 2024
de9c80a
merge new axis branching with the compressed tree and the new cpp ver…
mathleur May 29, 2024
fff532c
change protobuf encoding and decoding to fit proto class
mathleur Jun 3, 2024
9d7d457
start adding capability to unmap trees
mathleur Jun 4, 2024
69e1d08
finish tree unmapping
mathleur Jun 5, 2024
4316b1f
test unmapping tree for encoding
mathleur Jun 5, 2024
941abf3
implement hidden nodes and transform tree before giving to gj
mathleur Jun 6, 2024
07cdf3f
add protobuf encoding tests and fix encoding
mathleur Jun 6, 2024
9a880fb
update encoding to keep track of the hidden nodes
mathleur Jun 7, 2024
2d60615
update encoding to write to file too
mathleur Jun 11, 2024
e2de8d1
clean up
mathleur Jun 11, 2024
4d732bc
clean up
mathleur Jun 11, 2024
05cacf4
Merge branch 'develop' of github.com:ecmwf/polytope into develop
mathleur Jun 11, 2024
689f45c
pull origin develop
mathleur Jun 11, 2024
312ce5b
Merge pull request #135 from ecmwf/feature/updated_protobuf_newest_de…
mathleur Jun 11, 2024
834baca
fix flake8 in protobuf encoding
mathleur Jun 11, 2024
9839411
add protobuf requirements
mathleur Jun 11, 2024
4b4be19
add start of protobuf decoding into tree
mathleur Jun 11, 2024
4665b3b
decode protobuf into tree
mathleur Jun 12, 2024
af22742
isort + black
mathleur Jun 12, 2024
ff9b687
add non-box compressed lon shape
mathleur Jun 19, 2024
9be9205
fix bug with non-box shapes fdb extraction
mathleur Jun 19, 2024
434a85d
clean up
mathleur Jun 19, 2024
a175bcf
update requirements
mathleur Jun 19, 2024
7c696f3
fix non-box shape for compressed tree branch
mathleur Jun 19, 2024
2557aa7
black
mathleur Jun 19, 2024
8a779f0
Merge pull request #142 from ecmwf/feature/updated_protobuf_w_develop…
mathleur Jun 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions performance/fdb_slice_many_numbers_timeseries.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@
"step": {"type_change": "int"},
"number": {"type_change": "int"},
"longitude": {"cyclic": [0, 360]},
"latitude": {"reverse": {True}},
}

config = {"class": "od", "expver": "0001", "levtype": "sfc", "type": "pf"}
fdbdatacube = FDBDatacube(config, axis_options=options)
self_API = Polytope(datacube=fdbdatacube, axis_options=options)
Expand All @@ -39,7 +41,12 @@
Point(["latitude", "longitude"], [[0.04, 0]], method="surrounding"),
All("number"),
)
time3 = time.time()
result = self_API.retrieve(request)
time4 = time.time()
print(time.time() - time1)
print(time.time() - time2)
print(time4 - time3)
print(len(result.leaves))
print([len(leaf.result) for leaf in result.leaves])
result.pprint()
97 changes: 34 additions & 63 deletions polytope/datacube/backends/datacube.py
Original file line number Diff line number Diff line change
@@ -1,30 +1,23 @@
import argparse
import logging
from abc import ABC, abstractmethod
from typing import Any, List, Literal, Optional, Union

import xarray as xr
from conflator import ConfigModel, Conflator
from pydantic import ConfigDict
from typing import Any

from ...utility.combinatorics import validate_axes
from ..datacube_axis import DatacubeAxis
from ..index_tree import DatacubePath, IndexTree
from ..tensor_index_tree import DatacubePath, TensorIndexTree
from ..transformations.datacube_mappers.datacube_mappers import DatacubeMapper
from ..transformations.datacube_transformations import (
DatacubeAxisTransformation,
has_transform,
)


class Datacube(ABC):
def __init__(self, axis_options=None, datacube_options=None):
def __init__(self, axis_options=None, compressed_axes_options=[]):
if axis_options is None:
self.axis_options = {}
else:
self.axis_options = axis_options
if datacube_options is None:
datacube_options = {}
self.axis_with_identical_structure_after = datacube_options.get("identical structure after")
self.coupled_axes = []
self.axis_counter = 0
self.complete_axes = []
Expand All @@ -34,10 +27,13 @@ def __init__(self, axis_options=None, datacube_options=None):
self.nearest_search = {}
self._axes = None
self.transformed_axes = []
self.compressed_grid_axes = []
self.merged_axes = []
self.unwanted_path = {}
self.compressed_axes = compressed_axes_options

@abstractmethod
def get(self, requests: IndexTree) -> Any:
def get(self, requests: TensorIndexTree) -> Any:
"""Return data given a set of request trees"""

@property
Expand All @@ -57,10 +53,27 @@ def _create_axes(self, name, values, transformation_type_key, transformation_opt
transformation = DatacubeAxisTransformation.create_transform(
name, transformation_type_key.name, transformation_options
)

# do not compress merged axes
if transformation_type_key.name == "merge":
self.merged_axes.append(name)
self.merged_axes.append(final_axis_names)
for axis in final_axis_names:
# remove the merged_axes from the possible compressed axes
if axis in self.compressed_axes:
self.compressed_axes.remove(axis)

for blocked_axis in transformation.blocked_axes():
self.blocked_axes.append(blocked_axis)
if isinstance(transformation, DatacubeMapper):
# TODO: do we use this?? This shouldn't work for a disk in lat/lon on a octahedral or other grid??
for compressed_grid_axis in transformation.compressed_grid_axes:
self.compressed_grid_axes.append(compressed_grid_axis)
if len(final_axis_names) > 1:
self.coupled_axes.append(final_axis_names)
for axis in final_axis_names:
if axis in self.compressed_axes:
self.compressed_axes.remove(axis)
for axis_name in final_axis_names:
self.fake_axes.append(axis_name)
# if axis does not yet exist, create it
Expand Down Expand Up @@ -132,57 +145,15 @@ def remap_path(self, path: DatacubePath):
return path

@staticmethod
def create_axes_config(axis_options):
class TransformationConfig(ConfigModel):
model_config = ConfigDict(extra="forbid")
name: str = ""

class CyclicConfig(TransformationConfig):
name: Literal["cyclic"]
range: List[float] = [0]

class MapperConfig(TransformationConfig):
name: Literal["mapper"]
type: str = ""
resolution: Union[int, List[int]] = 0
axes: List[str] = [""]
local: Optional[List[float]] = None

class ReverseConfig(TransformationConfig):
name: Literal["reverse"]
is_reverse: bool = False

class TypeChangeConfig(TransformationConfig):
name: Literal["type_change"]
type: str = "int"

class MergeConfig(TransformationConfig):
name: Literal["merge"]
other_axis: str = ""
linkers: List[str] = [""]

action_subclasses_union = Union[CyclicConfig, MapperConfig, ReverseConfig, TypeChangeConfig, MergeConfig]

class AxisConfig(ConfigModel):
axis_name: str = ""
transformations: list[action_subclasses_union]

class Config(ConfigModel):
config: list[AxisConfig] = []

parser = argparse.ArgumentParser(allow_abbrev=False)
axis_config = Conflator(app_name="polytope", model=Config, cli=False, argparser=parser).load()
if axis_options.get("config"):
axis_config = Config(config=axis_options.get("config"))

return axis_config

@staticmethod
def create(datacube, axis_options: dict, datacube_options={}):
if isinstance(datacube, (xr.core.dataarray.DataArray, xr.core.dataset.Dataset)):
def create(request, datacube, config={}, axis_options={}, compressed_axes_options=[]):
# TODO: get the configs as None for pre-determined value and change them to empty dictionary inside the function
if type(datacube).__name__ == "DataArray":
from .xarray import XArrayDatacube

xadatacube = XArrayDatacube(datacube, axis_options, datacube_options)
xadatacube = XArrayDatacube(datacube, axis_options, compressed_axes_options)
return xadatacube
else:
return datacube
if type(datacube).__name__ == "GribJump":
from .fdb import FDBDatacube

fdbdatacube = FDBDatacube(datacube, request, config, axis_options, compressed_axes_options)
return fdbdatacube
113 changes: 80 additions & 33 deletions polytope/datacube/backends/fdb.py
Original file line number Diff line number Diff line change
@@ -1,27 +1,27 @@
import logging
from copy import deepcopy

import pygribjump as pygj
from itertools import product

from ...utility.geometry import nearest_pt
from .datacube import Datacube, IndexTree
from .datacube import Datacube, TensorIndexTree


class FDBDatacube(Datacube):
def __init__(self, request, config=None, axis_options=None, datacube_options=None):
def __init__(self, gj, request, config=None, axis_options=None, compressed_axes_options=[]):
if config is None:
config = {}

super().__init__(axis_options, datacube_options)
super().__init__(axis_options, compressed_axes_options)

logging.info("Created an FDB datacube with options: " + str(axis_options))

self.unwanted_path = {}
self.axis_options = Datacube.create_axes_config(axis_options).config
self.axis_options = axis_options

partial_request = config
# Find values in the level 3 FDB datacube
self.gj = pygj.GribJump()

self.gj = gj
self.fdb_coordinates = self.gj.axes(partial_request)

self.check_branching_axes(request)
Expand Down Expand Up @@ -61,20 +61,45 @@ def check_branching_axes(self, request):
(upper, lower, idx) = polytope.extents(ax)
if "sfc" in polytope.points[idx]:
self.fdb_coordinates.pop("levelist")
print(self.fdb_coordinates)
self.fdb_coordinates.pop("quantile", None)
print(self.fdb_coordinates)

def get(self, requests: IndexTree):
def get(self, requests: TensorIndexTree):
if len(requests.children) == 0:
return requests
fdb_requests = []
fdb_requests_decoding_info = []
self.get_fdb_requests(requests, fdb_requests, fdb_requests_decoding_info)
output_values = self.gj.extract(fdb_requests)
self.assign_fdb_output_to_nodes(output_values, fdb_requests_decoding_info)

def get_fdb_requests(self, requests: IndexTree, fdb_requests=[], fdb_requests_decoding_info=[], leaf_path=None):
# TODO: note that this doesn't exactly work as intended, it's just going to retrieve value from gribjump that
# corresponds to first value in the compressed tuples

# TODO: here, loop through the fdb requests and request from gj and directly add to the nodes

for j, compressed_request in enumerate(fdb_requests):
uncompressed_request = {}

# Need to determine the possible decompressed requests

# find the possible combinations of compressed indices

interm_branch_tuple_values = []
for key in compressed_request[0].keys():
# remove the tuple of the request when we ask the fdb
interm_branch_tuple_values.append(compressed_request[0][key])
request_combis = product(*interm_branch_tuple_values)

# Need to extract the possible requests and add them to the right nodes
for combi in request_combis:
uncompressed_request = {}
for i, key in enumerate(compressed_request[0].keys()):
uncompressed_request[key] = combi[i]
complete_uncompressed_request = (uncompressed_request, compressed_request[1])
output_values = self.gj.extract([complete_uncompressed_request])
self.assign_fdb_output_to_nodes(output_values, [fdb_requests_decoding_info[j]])

def get_fdb_requests(
self, requests: TensorIndexTree, fdb_requests=[], fdb_requests_decoding_info=[], leaf_path=None
):
if leaf_path is None:
leaf_path = {}

Expand All @@ -86,7 +111,7 @@ def get_fdb_requests(self, requests: IndexTree, fdb_requests=[], fdb_requests_de
self.get_fdb_requests(c, fdb_requests, fdb_requests_decoding_info)
# If request node has no children, we have a leaf so need to assign fdb values to it
else:
key_value_path = {requests.axis.name: requests.value}
key_value_path = {requests.axis.name: requests.values}
ax = requests.axis
(key_value_path, leaf_path, self.unwanted_path) = ax.unmap_path_key(
key_value_path, leaf_path, self.unwanted_path
Expand Down Expand Up @@ -136,7 +161,7 @@ def get_2nd_last_values(self, requests, leaf_path=None):
found_latlon_pts = []
for lat_child in requests.children:
for lon_child in lat_child.children:
found_latlon_pts.append([lat_child.value, lon_child.value])
found_latlon_pts.append([lat_child.values, lon_child.values])

# now find the nearest lat lon to the points requested
nearest_latlons = []
Expand All @@ -145,20 +170,21 @@ def get_2nd_last_values(self, requests, leaf_path=None):
nearest_latlons.append(nearest_latlon)

# need to remove the branches that do not fit
lat_children_values = [child.value for child in requests.children]
lat_children_values = [child.values for child in requests.children]
for i in range(len(lat_children_values)):
lat_child_val = lat_children_values[i]
lat_child = [child for child in requests.children if child.value == lat_child_val][0]
if lat_child.value not in [latlon[0] for latlon in nearest_latlons]:
lat_child = [child for child in requests.children if child.values == lat_child_val][0]
if lat_child.values not in [(latlon[0],) for latlon in nearest_latlons]:
lat_child.remove_branch()
else:
possible_lons = [latlon[1] for latlon in nearest_latlons if latlon[0] == lat_child.value]
lon_children_values = [child.value for child in lat_child.children]
possible_lons = [latlon[1] for latlon in nearest_latlons if (latlon[0],) == lat_child.values]
lon_children_values = [child.values for child in lat_child.children]
for j in range(len(lon_children_values)):
lon_child_val = lon_children_values[j]
lon_child = [child for child in lat_child.children if child.value == lon_child_val][0]
if lon_child.value not in possible_lons:
lon_child.remove_branch()
lon_child = [child for child in lat_child.children if child.values == lon_child_val][0]
for value in lon_child.values:
if value not in possible_lons:
lon_child.remove_compressed_branch(value)

lat_length = len(requests.children)
range_lengths = [False] * lat_length
Expand All @@ -167,13 +193,13 @@ def get_2nd_last_values(self, requests, leaf_path=None):
for i in range(len(requests.children)):
lat_child = requests.children[i]
lon_length = len(lat_child.children)
range_lengths[i] = [1] * lon_length
range_lengths[i] = [0] * lon_length
current_start_idxs[i] = [None] * lon_length
fdb_node_ranges[i] = [[IndexTree.root] * lon_length] * lon_length
fdb_node_ranges[i] = [[TensorIndexTree.root for y in range(lon_length)] for x in range(lon_length)]
range_length = deepcopy(range_lengths[i])
current_start_idx = deepcopy(current_start_idxs[i])
fdb_range_nodes = deepcopy(fdb_node_ranges[i])
key_value_path = {lat_child.axis.name: lat_child.value}
key_value_path = {lat_child.axis.name: lat_child.values}
ax = lat_child.axis
(key_value_path, leaf_path, self.unwanted_path) = ax.unmap_path_key(
key_value_path, leaf_path, self.unwanted_path
Expand All @@ -184,30 +210,31 @@ def get_2nd_last_values(self, requests, leaf_path=None):
)

leaf_path_copy = deepcopy(leaf_path)
leaf_path_copy.pop("values")
# leaf_path_copy.update(self.necessary_popped_axes)
leaf_path_copy.pop("values", None)
return (leaf_path_copy, range_lengths, current_start_idxs, fdb_node_ranges, lat_length)

def get_last_layer_before_leaf(self, requests, leaf_path, range_l, current_idx, fdb_range_n):
i = 0
for c in requests.children:
fdb_range_n_i = fdb_range_n[i]
# now c are the leaves of the initial tree
key_value_path = {c.axis.name: c.value}
key_value_path = {c.axis.name: c.values}
ax = c.axis
(key_value_path, leaf_path, self.unwanted_path) = ax.unmap_path_key(
key_value_path, leaf_path, self.unwanted_path
)
leaf_path.update(key_value_path)
last_idx = key_value_path["values"]
if current_idx[i] is None:
range_l[i] = 1
current_idx[i] = last_idx
fdb_range_n[i][range_l[i] - 1] = c
fdb_range_n_i[range_l[i] - 1] = c
else:
if last_idx == current_idx[i] + range_l[i]:
range_l[i] += 1
fdb_range_n[i][range_l[i] - 1] = c
fdb_range_n_i[range_l[i] - 1] = c
else:
key_value_path = {c.axis.name: c.value}
key_value_path = {c.axis.name: c.values}
ax = c.axis
(key_value_path, leaf_path, self.unwanted_path) = ax.unmap_path_key(
key_value_path, leaf_path, self.unwanted_path
Expand All @@ -216,6 +243,8 @@ def get_last_layer_before_leaf(self, requests, leaf_path, range_l, current_idx,
i += 1
current_start_idx = key_value_path["values"]
current_idx[i] = current_start_idx
range_l[i] = 1
fdb_range_n[i][range_l[i] - 1] = c
return (range_l, current_idx, fdb_range_n)

def assign_fdb_output_to_nodes(self, output_values, fdb_requests_decoding_info):
Expand All @@ -240,7 +269,7 @@ def assign_fdb_output_to_nodes(self, output_values, fdb_requests_decoding_info):
for i in range(len(sorted_fdb_range_nodes)):
for j in range(sorted_range_lengths[i]):
n = sorted_fdb_range_nodes[i][j]
n.result = request_output_values[0][i][0][j]
n.result.append(request_output_values[0][i][0][j])

def sort_fdb_request_ranges(self, range_lengths, current_start_idx, lat_length):
interm_request_ranges = []
Expand All @@ -263,3 +292,21 @@ def select(self, path, unmapped_path):

def ax_vals(self, name):
return self.fdb_coordinates.get(name, None)

def prep_tree_encoding(self, node, unwanted_path=None):
# TODO: prepare the tree for protobuf encoding
# ie transform all axes for gribjump and adding the index property on the leaves
if unwanted_path is None:
unwanted_path = {}

ax = node.axis
(new_node, unwanted_path) = ax.unmap_tree_node(node, unwanted_path)

if len(node.children) != 0:
for c in new_node.children:
self.prep_tree_encoding(c, unwanted_path)

def prep_tree_decoding(self, tree):
# TODO: transform the tree after decoding from protobuf
# ie unstransform all axes from gribjump and put the indexes back as a leaf/extra node
pass
Loading
Loading