Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get dataset coordinates from describe parts instead of only at service variables #277

Open
veenstrajelmer opened this issue Jan 27, 2025 · 0 comments

Comments

@veenstrajelmer
Copy link

veenstrajelmer commented Jan 27, 2025

As originally described in Deltares/dfm_tools#1082, it would be useful to easily get the coordinates from a part that is returned by describe. Or maybe even from the dataset version, so one level up. For instance, getting the time extents is currently a bit cumbersome and not so robust since they are only available at the variables of the service:

import copernicusmarine
import pandas as pd
# importing a private xarray function here
from xarray.coding.times import decode_cf_datetime

data = copernicusmarine.describe(
    dataset_id='cmems_mod_glo_phy-cur_anfc_0.083deg_P1D-m',
    disable_progress_bar=True,
    )

def convert_time(time_raw, time_units):
    time_np = decode_cf_datetime(num_dates=[time_raw], units=time_units)
    time_pd = pd.Timestamp(time_np[0])
    return time_pd

# check if there is indeed only one of products/datasets/versions/parts
assert len(data.products) == 1
assert len(data.products[0].datasets) == 1
assert len(data.products[0].datasets[0].versions) == 1
assert len(data.products[0].datasets[0].versions[0].parts) == 1

# there are four services, but only geoseries and timeseries contain coordinates in their data variables.
# I expect that in the service "original-files" the time values 
# would be 12 hours different from those in geoseries/timeseries (netcdf vs ARCO)
part = data.products[0].datasets[0].versions[0].parts[0]
service_arco_geo_series = part.get_service_by_service_name(service_name="arco-geo-series")

# Therefore, we get the coordinates from the first data variable in the service.
# We assume now that all data variables contain the coordinates, might be tricky
# It would be useful to attach the coordinates also to the service itself,
# since I expect these are valid here. This would avoid some nesting
var0 = service_arco_geo_series.variables[0]
var0_coordinates = var0.coordinates

# get time coordinate by searching for coordinate_id="time"
coordinate_ids = [x.coordinate_id for x in var0_coordinates]
timecoord_idx = coordinate_ids.index("time")
time_coord = var0_coordinates[timecoord_idx]

# the time extents are raw numbers w.r.t. a reference date
time_units = time_coord.coordinate_unit
time_min_raw = time_coord.minimum_value
time_max_raw = time_coord.maximum_value

# convert to pandas timestamps
time_min = convert_time(time_min_raw, time_units)
time_max = convert_time(time_max_raw, time_units)
print(f"Minimum Value: {time_min}")
print(f"Maximum Value: {time_max}")
@veenstrajelmer veenstrajelmer changed the title add dataset coordinates also to describe part instead of only at service variables get dataset coordinates from describe parts instead of only at service variables Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant