Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDAP-498 - Satellite Units #296

Draft
wants to merge 98 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
6b1b7cc
Separated NTS backends
Jun 29, 2023
4f3f611
n/a
Jul 5, 2023
e32d5ad
More nts backend stuff
Jul 6, 2023
ccc0de4
Working(?) np backend
Jul 10, 2023
743fb1d
Merge branch 'apache:master' into SDAP-472-gridded-zarr
RKuttruff Jul 10, 2023
b77aa11
Working(?) np backend
Jul 10, 2023
4ccec2e
gitignore ini
Jul 10, 2023
736a44e
ASF headers
Jul 10, 2023
70bdab1
First functioning test of 2 simultaneous backends
Jul 11, 2023
f3981cd
Removed accidentally committed ini files
Jul 12, 2023
26f6220
Working zarr backend ds list
Jul 12, 2023
91de6ef
Capture and handle NTS requests routed to backend that doesn't (yet) …
Jul 12, 2023
df23919
analysis setup fails to find VERSION.txt when building locally
Jul 12, 2023
07404f0
Implemented more NTS functions in zarr backend
Jul 12, 2023
72888aa
Added misc backend time metrics record field in NCSH
Jul 12, 2023
1c4a0e4
fixes
Jul 13, 2023
0a7cd7f
Dynamic dataset management
Jul 17, 2023
c8e7dbb
Dynamic dataset management
Jul 18, 2023
e78f7ad
Dataset management
Jul 20, 2023
a84d77e
Timeseriesspark support
Jul 27, 2023
53190e2
Update backend dict on dataset mgmt query
Jul 31, 2023
2e7a0dc
Fixes and improvements
Jul 31, 2023
0869375
Adapted matchup to work with zarr backends
Jul 31, 2023
c156826
Merge branch 'apache:master' into SDAP-472-gridded-zarr
RKuttruff Jul 31, 2023
1eb680b
Zarr support
Aug 1, 2023
0aef0f1
DDAS adjustments
Aug 2, 2023
42b912e
find_tile_by_polygon_and_most_recent_day_of_year impl
Aug 3, 2023
1559fba
Don't sel by time if neither max nor min time are given
Aug 8, 2023
2bb52af
Fix not calling partial when needed
Aug 15, 2023
f9dc2ae
Pinned s3fs and fsspec versions
Aug 18, 2023
a6f602d
Fixed some dependencies to ensure image builds properly + s3fs works
Aug 18, 2023
1a451eb
Config override for backends
Aug 21, 2023
6f8f7b1
Deps update
Aug 21, 2023
5baf9ec
Merge remote-tracking branch 'RKuttruff/master' into SDAP-472-gridded…
Aug 21, 2023
8cc9d5d
Merge branch 'apache:master' into SDAP-472-gridded-zarr
RKuttruff Aug 22, 2023
492be4b
Merge branch 'apache:master' into SDAP-472-gridded-zarr
RKuttruff Aug 23, 2023
483ad9f
Add metadata from Zarr collection to /list
Aug 31, 2023
4b24ec3
Merge branch 'apache:master' into SDAP-472-gridded-zarr
RKuttruff Sep 6, 2023
6077ac2
Merge branch 'apache:master' into SDAP-472-gridded-zarr
RKuttruff Sep 7, 2023
0d3c0fc
Merge branch 'apache:master' into SDAP-472-gridded-zarr
RKuttruff Sep 14, 2023
8d51337
Merge branch 'apache:master' into SDAP-472-gridded-zarr
RKuttruff Sep 14, 2023
f5750c3
Zarr: Probe lat order and flip if necessary
Sep 14, 2023
7fc260a
Strip quotes from variable names
Sep 20, 2023
b5df944
removed resultSizeLimit param from matchup
skorper Sep 25, 2023
5e0fbb2
Add # of primaries/avergae secondaries to job output
skorper Sep 25, 2023
fbad6b7
rename to executionId
skorper Sep 25, 2023
e0a5999
update changelog
skorper Sep 25, 2023
8942afc
add totalSecondaryMatched field to /job output
skorper Sep 29, 2023
dd73036
num unique secondaries addition
skorper Sep 29, 2023
db68d4f
updated docs to use correct sea_water_temperature param name
skorper Oct 13, 2023
7e11a4c
Merge remote-tracking branch 'origin' into SDAP-493
skorper Nov 1, 2023
a8be9b8
bugfix
skorper Nov 1, 2023
62de867
fix division by zero bug
skorper Nov 6, 2023
972f3dd
add params to dataset management handler classes
Nov 8, 2023
9f0a107
Merge remote-tracking branch 'origin/master' into SDAP-472-gridded-zarr
Nov 8, 2023
831ca37
add page number to default filename for matchup output
skorper Nov 16, 2023
4ab2f9b
pagination improvements
skorper Nov 16, 2023
3677c11
removed debugging line
skorper Nov 16, 2023
86f1348
changelog
skorper Nov 16, 2023
1e8cc4e
Update helm cassandra dependency (#289)
RKuttruff Nov 27, 2023
faed801
Register dataset docs with nexusproto backend + static getters
Dec 14, 2023
20902eb
Matchup impl
Dec 14, 2023
1af0c41
Add vars to headers in CDMS subsetter
Dec 18, 2023
8a069db
Add units to all matchup result formats
Dec 21, 2023
0c39b07
Formatting for units in subsetter headers
Dec 21, 2023
32ca3d7
stac catalog
skorper Jan 5, 2024
3563ae9
Updated openapi spec
skorper Jan 6, 2024
0691d87
move stac endpoints to matchup tag in openapi spec
skorper Jan 6, 2024
61e6223
Meta field in matchup result - all formats
Jan 8, 2024
e02fc78
SDAP-507 - Changes to remove geos sub-dependency
Jan 11, 2024
51231ca
SDAP-507 - Changelog
Jan 11, 2024
5c75573
SDAP-507 - Changes to remove geos sub-dependency
Jan 11, 2024
7f717c0
SDAP-507 - Changelog
Jan 11, 2024
2b6efa6
Merge remote-tracking branch 'origin/SDAP-507' into SDAP-507
Jan 11, 2024
9779f40
delete instead of comment out
Jan 19, 2024
3e700b8
Merge branch 'SDAP-500' into SDAP-499
skorper Jan 19, 2024
9378760
Revert "Update helm cassandra dependency (#289)"
skorper Jan 19, 2024
092c87b
Merge branch 'SDAP-499' into SDAP-506
skorper Jan 19, 2024
e6730eb
Merge branch 'release/1.2.0' into SDAP-506
skorper Jan 19, 2024
3aafb7e
Merge remote-tracking branch 'origin/SDAP-506' into SDAP-472-gridded-…
Jan 19, 2024
8d9bf73
Merge branch 'SDAP-472-gridded-zarr' into SDAP-498-satellite-units
Jan 19, 2024
5303146
deleted disabled endpoint files
Jan 19, 2024
d6a75e3
Merge branch 'SDAP-507' into SDAP-472-gridded-zarr
Jan 19, 2024
f8983a0
Merge branch 'SDAP-507' into SDAP-498-satellite-units
Jan 19, 2024
2ed29fd
Merge branch 'develop' into SDAP-493
skorper Jan 19, 2024
2a340dc
Merge branch 'SDAP-493' into SDAP-500
skorper Jan 19, 2024
718067c
Merge branch 'SDAP-500' into SDAP-499
skorper Jan 19, 2024
681ba5a
Merge branch 'SDAP-499' into SDAP-506
skorper Jan 19, 2024
935000b
Merge branch 'release/1.2.0' into SDAP-472-gridded-zarr
Jan 19, 2024
ee5e5c8
fix bug where still-running jobs failed /job endpoint due to missing …
skorper Jan 25, 2024
0f388a3
Merge branch 'release/1.2.0' into SDAP-506
skorper Jan 25, 2024
6bd5f0e
Merge remote-tracking branch 'origin/SDAP-506' into SDAP-472-gridded-…
Jan 29, 2024
2c7a803
Merge branch 'SDAP-472-gridded-zarr' into SDAP-498-satellite-units
Jan 29, 2024
40a80e2
Don't write an empty row between meta blocks in CSV writer
Jan 29, 2024
322fd68
Merge remote-tracking branch 'origin/develop' into SDAP-498-satellite…
Feb 1, 2024
34b7b95
Moved changelog entries
Feb 1, 2024
6de825b
SDAP-472 changelog entries
Feb 1, 2024
2aaf07b
SDAP-498 changelog entries
Feb 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,6 @@
*.idea
*.DS_Store
analysis/webservice/algorithms/doms/domsconfig.ini
data-access/nexustiles/config/datastores.ini
data-access/nexustiles/backends/nexusproto/config/datastores.ini
data-access/nexustiles/config/datasets.ini
venv/
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- SDAP-506:
- Added STAC Catalog endpoint for matchup outputs
- SDAP-472:
- Support for Zarr backend (gridded data only)
- Dataset management endpoints for Zarr datasets
- SDAP-498: Support for satellite units & other dataset-level metadata
### Changed
- SDAP-493:
- Updated /job endpoint to use `executionId` terminology for consistency with existing `/cdmsresults` endpoint
- Updated /job endpoint with details about number of primary and secondary tiles.
- SDAP-500: Improvements to SDAP Asynchronous Jobs
- SDAP-499: Added page number to default filename for matchup output
- SDAP-472: Overhauled `data-access` to support multiple backends for simultaneous support of multiple ARD formats
### Deprecated
### Removed
- SDAP-493:
Expand Down
5 changes: 3 additions & 2 deletions analysis/conda-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ pytz==2021.1
utm==0.6.0
shapely==1.7.1
backports.functools_lru_cache==1.6.1
boto3==1.16.63
boto3>=1.16.63
botocore==1.24.21
pillow==8.1.0
mpld3=0.5.1
tornado==6.1
Expand All @@ -33,4 +34,4 @@ gdal==3.2.1
mock==4.0.3
importlib_metadata==4.11.4
#singledispatch==3.4.0.3

schema
7 changes: 5 additions & 2 deletions analysis/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,11 @@
import setuptools
from subprocess import check_call, CalledProcessError

with open('../VERSION.txt', 'r') as f:
__version__ = f.read()
try:
with open('../VERSION.txt', 'r') as f:
__version__ = f.read()
except:
__version__ = None


try:
Expand Down
3 changes: 2 additions & 1 deletion analysis/webservice/algorithms/DailyDifferenceAverage.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@

import numpy as np
import pytz
from nexustiles.nexustiles import NexusTileService, NexusTileServiceException
from nexustiles.nexustiles import NexusTileService
from nexustiles.exception import NexusTileServiceException
from shapely.geometry import box

from webservice.NexusHandler import nexus_handler
Expand Down
2 changes: 1 addition & 1 deletion analysis/webservice/algorithms/StandardDeviationSearch.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
from datetime import datetime
from functools import partial

from nexustiles.nexustiles import NexusTileServiceException
from nexustiles.exception import NexusTileServiceException
from pytz import timezone

from webservice.NexusHandler import nexus_handler
Expand Down
70 changes: 61 additions & 9 deletions analysis/webservice/algorithms/doms/BaseDomsHandler.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,14 +78,15 @@ def default(self, obj):

class DomsQueryResults(NexusResults):
def __init__(self, results=None, args=None, bounds=None, count=None, details=None, computeOptions=None,
executionId=None, status_code=200, page_num=None, page_size=None):
NexusResults.__init__(self, results=results, meta=None, stats=None, computeOptions=computeOptions,
executionId=None, status_code=200, page_num=None, page_size=None, meta=None):
NexusResults.__init__(self, results=results, meta=meta, stats=None, computeOptions=computeOptions,
status_code=status_code)
self.__args = args
self.__bounds = bounds
self.__count = count
self.__details = details
self.__executionId = str(executionId)
self.__meta = meta if meta is not None else {}

if self.__details is None:
self.__details = {}
Expand All @@ -98,26 +99,27 @@ def toJson(self):
bounds = self.__bounds.toMap() if self.__bounds is not None else {}
return json.dumps(
{"executionId": self.__executionId, "data": self.results(), "params": self.__args, "bounds": bounds,
"count": self.__count, "details": self.__details}, indent=4, cls=DomsEncoder)
"count": self.__count, "details": self.__details, "metadata": self.__meta}, indent=4, cls=DomsEncoder)

def toCSV(self):
return DomsCSVFormatter.create(self.__executionId, self.results(), self.__args, self.__details)
return DomsCSVFormatter.create(self.__executionId, self.results(), self.__args, self.__details, self.__meta)

def toNetCDF(self):
return DomsNetCDFFormatter.create(self.__executionId, self.results(), self.__args, self.__details)
return DomsNetCDFFormatter.create(self.__executionId, self.results(), self.__args, self.__details, self.__meta)

def filename(self):
return f'CDMS_{self.__executionId}_page{self.__details["pageNum"]}'


class DomsCSVFormatter:
@staticmethod
def create(executionId, results, params, details):
def create(executionId, results, params, details, metadata):

csv_mem_file = io.StringIO()
try:
DomsCSVFormatter.__addConstants(csv_mem_file)
DomsCSVFormatter.__addDynamicAttrs(csv_mem_file, executionId, results, params, details)
DomsCSVFormatter.__addMetadata(csv_mem_file, metadata)
csv.writer(csv_mem_file).writerow([])

DomsCSVFormatter.__packValues(csv_mem_file, results)
Expand All @@ -135,7 +137,11 @@ def is_empty(s):

name = variable['cf_variable_name']

return name if not is_empty(name) else variable['variable_name']
header_name = name if not is_empty(name) else variable['variable_name']

unit = variable.get('variable_unit', None)

return f'{header_name} ({unit})' if unit is not None else header_name

@staticmethod
def __packValues(csv_mem_file, results):
Expand Down Expand Up @@ -288,10 +294,31 @@ def __addDynamicAttrs(csvfile, executionId, results, params, details):

writer.writerows(global_attrs)

@staticmethod
def __addMetadata(csvfile, meta):
def meta_dict_to_list(meta_dict: dict, prefix='metadata') -> list:
attrs = []

for key in meta_dict:
new_key = key if prefix == '' else f'{prefix}.{key}'
value = meta_dict[key]

if isinstance(value, dict):
attrs.extend(meta_dict_to_list(value, new_key))
else:
attrs.append(dict(MetadataAttribute=new_key, Value=value))

return attrs

metadata_attrs = meta_dict_to_list(meta)

writer = csv.DictWriter(csvfile, sorted(next(iter(metadata_attrs)).keys()))
writer.writerows(metadata_attrs)


class DomsNetCDFFormatter:
@staticmethod
def create(executionId, results, params, details):
def create(executionId, results, params, details, metadata):

t = tempfile.mkstemp(prefix="cdms_", suffix=".nc")
tempFileName = t[1]
Expand Down Expand Up @@ -335,6 +362,30 @@ def create(executionId, results, params, details):
dataset.CDMS_page_num = details["pageNum"]
dataset.CDMS_page_size = details["pageSize"]

####TEST

def meta_dict_to_list(meta_dict: dict, prefix='metadata') -> list:
attrs = []

for key in meta_dict:
new_key = key if prefix == '' else f'{prefix}.{key}'
value = meta_dict[key]

if value is None:
value = 'NULL'
elif isinstance(value, list):
value = json.dumps(value)

if isinstance(value, dict):
attrs.extend(meta_dict_to_list(value, new_key))
else:
attrs.append((new_key, value))

return attrs

for attr in meta_dict_to_list(metadata):
setattr(dataset, *attr)

insituDatasets = params["matchup"]
insituLinks = set()
for insitu in insituDatasets:
Expand Down Expand Up @@ -534,7 +585,8 @@ def writeGroup(self):
self.__enrichVariable(data_variable, min_data, max_data, has_depth=None, unit=units[variable])
data_variable[:] = np.ma.masked_invalid(variables[variable])
data_variable.long_name = name
data_variable.standard_name = cf_name
if cf_name:
data_variable.standard_name = cf_name

#
# Lists may include 'None" values, to calc min these must be filtered out
Expand Down
24 changes: 23 additions & 1 deletion analysis/webservice/algorithms/doms/ResultsRetrieval.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,11 @@
from . import ResultsStorage
from webservice.NexusHandler import nexus_handler
from webservice.webmodel import NexusProcessingException
from nexustiles.nexustiles import NexusTileService

import logging

log = logging.getLogger(__name__)


@nexus_handler
Expand Down Expand Up @@ -48,5 +53,22 @@ def calc(self, computeOptions, **args):
with ResultsStorage.ResultsRetrieval(self.config) as storage:
params, stats, data = storage.retrieveResults(execution_id, trim_data=simple_results, page_num=page_num, page_size=page_size)

try:
ds_metadata = {}
ds_meta_primary_name = params['primary']

primary_metadata = NexusTileService.get_metadata_for_dataset(ds_meta_primary_name)

ds_metadata['primary'] = {ds_meta_primary_name: primary_metadata}

ds_metadata['secondary'] = {}

for secondary_ds_name in params['matchup'].split(','):
ds_metadata['secondary'][secondary_ds_name] = NexusTileService.get_metadata_for_dataset(secondary_ds_name)
except:
log.warning('Could not build dataset metadata dict due to an error')
ds_metadata = {}

return BaseDomsHandler.DomsQueryResults(results=data, args=params, details=stats, bounds=None, count=len(data),
computeOptions=None, executionId=execution_id, page_num=page_num, page_size=page_size)
computeOptions=None, executionId=execution_id, page_num=page_num,
page_size=page_size, meta=dict(datasets=ds_metadata))
34 changes: 31 additions & 3 deletions analysis/webservice/algorithms/doms/subsetter.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@
from webservice.algorithms.doms.insitu import query_insitu
from webservice.webmodel import NexusProcessingException, NexusResults

from nexustiles.nexustiles import NexusTileService

from . import BaseDomsHandler

ISO_8601 = '%Y-%m-%dT%H:%M:%S%z'
Expand Down Expand Up @@ -302,20 +304,46 @@ def toCsv(self):
logging.info('Converting result to CSV')

for dataset_name, results in dataset_results.items():
try:
ds_metadata = NexusTileService.get_metadata_for_dataset(dataset_name)
except:
ds_metadata = {}

ds_vars = ds_metadata.get('variables', [])

variable_dict = {}
variable_dict_cf = {}

for v in ds_vars:
variable_dict[v['name']] = v
variable_dict_cf[v['cf_standard_name']] = v

rows = []

headers = [
'longitude',
'latitude',
'time'
]
data_variables = list(set([keys for result in results for keys in result['data'].keys()]))
data_variables.sort()

data_variables = []
data_variable_headers = []

for dv in sorted(list(set([keys for result in results for keys in result['data'].keys()]))):
data_variables.append(dv)

if dv in variable_dict_cf and variable_dict_cf[dv]["unit"] is not None:
data_variable_headers.append(f'{dv} ({variable_dict_cf[dv]["unit"]})')
elif dv in variable_dict and variable_dict[dv]["unit"] is not None:
data_variable_headers.append(f'{dv} ({variable_dict[dv]["unit"]})')
else:
data_variable_headers.append(dv)

if 'id' in list(set([keys for result in results for keys in result.keys()])):
headers.append('id')

headers.extend(data_variables)
headers.extend(data_variable_headers)

for i, result in enumerate(results):
cols = []

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -324,7 +324,7 @@ def calculate_diff(tile_service_factory, tile_ids, bounding_wkt, dataset, climat
for tile_id in tile_ids:
# Get the dataset tile
try:
dataset_tile = get_dataset_tile(tile_service, wkt.loads(bounding_wkt.value), tile_id)
dataset_tile = get_dataset_tile(tile_service, wkt.loads(bounding_wkt.value), tile_id, dataset.value)
except NoDatasetTile:
# This should only happen if all measurements in a tile become masked after applying the bounding polygon
continue
Expand All @@ -348,12 +348,12 @@ def calculate_diff(tile_service_factory, tile_ids, bounding_wkt, dataset, climat
return chain(*diff_generators)


def get_dataset_tile(tile_service, search_bounding_shape, tile_id):
def get_dataset_tile(tile_service, search_bounding_shape, tile_id, dataset):
the_time = datetime.now()

try:
# Load the dataset tile
dataset_tile = tile_service.find_tile_by_id(tile_id)[0]
dataset_tile = tile_service.find_tile_by_id(tile_id, ds=dataset)[0]
# Mask it to the search domain
dataset_tile = tile_service.mask_tiles_to_polygon(search_bounding_shape, [dataset_tile])[0]
except IndexError:
Expand Down
8 changes: 4 additions & 4 deletions analysis/webservice/algorithms_spark/HofMoellerSpark.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,12 @@ class HofMoellerCalculator(object):
def hofmoeller_stats(tile_service_factory, metrics_callback, tile_in_spark):

(latlon, tile_id, index,
min_lat, max_lat, min_lon, max_lon) = tile_in_spark
min_lat, max_lat, min_lon, max_lon, dataset) = tile_in_spark

tile_service = tile_service_factory()
try:
# Load the dataset tile
tile = tile_service.find_tile_by_id(tile_id, metrics_callback=metrics_callback)[0]
tile = tile_service.find_tile_by_id(tile_id, metrics_callback=metrics_callback, ds=dataset)[0]
calculation_start = datetime.now()
# Mask it to the search domain
tile = tile_service.mask_tiles_to_bbox(min_lat, max_lat,
Expand Down Expand Up @@ -352,7 +352,7 @@ def calc(self, compute_options, **args):

min_lon, min_lat, max_lon, max_lat = bbox.bounds

nexus_tiles_spark = [(self._latlon, tile.tile_id, x, min_lat, max_lat, min_lon, max_lon) for x, tile in
nexus_tiles_spark = [(self._latlon, tile.tile_id, x, min_lat, max_lat, min_lon, max_lon, tile.dataset) for x, tile in
enumerate(self._get_tile_service().find_tiles_in_box(min_lat, max_lat, min_lon, max_lon,
ds, start_time, end_time,
metrics_callback=metrics_record.record_metrics,
Expand Down Expand Up @@ -408,7 +408,7 @@ def calc(self, compute_options, **args):

min_lon, min_lat, max_lon, max_lat = bbox.bounds

nexus_tiles_spark = [(self._latlon, tile.tile_id, x, min_lat, max_lat, min_lon, max_lon) for x, tile in
nexus_tiles_spark = [(self._latlon, tile.tile_id, x, min_lat, max_lat, min_lon, max_lon, tile.dataset) for x, tile in
enumerate(self._get_tile_service().find_tiles_in_box(min_lat, max_lat, min_lon, max_lon,
ds, start_time, end_time,
metrics_callback=metrics_record.record_metrics,
Expand Down
Loading