Feature request: add support for noon-centered times to `copernicusmarine.open_dataset()` #271

veenstrajelmer · 2025-01-15T15:29:15Z

Motivation
Since the indroduction of ARCO, all time-averaged datasets now have start-of-interval instead of center-of-interval time samples as documented in https://help.marine.copernicus.eu/en/articles/8656000-differences-between-netcdf-and-arco-formats. It makes complete sense to harmonize everything across datasets and I realize the ARCO structure was necessary to prepare for the impressive performance we see in the copernicusmarine toolbox these days.

However, in our institute we use the data as model forcing and we also compare it to measurements. What we see, is that the new time administration shows a clear offset, 12 hours in case of daily-averaged data. This makes sense, but it is quite inconvenient. Of course, it can be manually shifted in our workflows, but I would avoid this if possible.

Furthermore, the PUM states the daily averaged products are centered at noon, not at midnight. So the current behaviour could be confusing for users.:

I have been in touch with the helpdesk before, I think the original request was filed under MDSOP-179, if that helps.

Expected behavior
Support in copernicusmarine.open_dataset() for getting noon-centered times, for instance via an additional keyword.

Actual behavior
midnight-centered times, so the reproducible code prints:
1993-01-01 00:00:00 2021-06-30 00:00:00

Steps to reproduce

import copernicusmarine
import pandas as pd
# import logging
# logging.getLogger("copernicusmarine").setLevel(logging.DEBUG)

dataset_id = 'cmems_mod_glo_phy_my_0.083deg_P1D-m'
ds = copernicusmarine.open_dataset(
   dataset_id=dataset_id,
   service="arco-geo-series",
   chunk_size_limit=None,
   )

ds_tstart = pd.Timestamp(ds.time.isel(time=0).values)
ds_tstop = pd.Timestamp(ds.time.isel(time=-1).values)
print(ds_tstart, ds_tstop)

Alternatives
Alternatively, it would at least be helpful if there is averaging metadata present in the returned dataset, for instance as attributes in the time variable. So the fact that it is daily averaged, and that the times are start-of-interval-times. However, I can imagine that is not easy to implement across all averaged datasets. However, it would make it easier to apply the correct time corrections on our side (Deltares/dfm_tools#878).

Environment

Python 3.11.11
copernicusmarine 2.0.0

The text was updated successfully, but these errors were encountered:

renaudjester · 2025-01-16T16:23:59Z

We will discuss it internally because as you said:

Of course, it can be manually shifted in our workflows, but I would avoid this if possible.

That could also be the position of the toolbox. But we will get back to you!

veenstrajelmer · 2025-01-28T10:38:39Z

The easiest way to clarify the offset that I described in the issue description is with a sine wave with a period of several days. This is also the timescale on which processes like zos/temperature/salinity could take place. This is purely for illustration purposes, I realize this is not actual data, but this is way easier to produce:

import matplotlib.pyplot as plt
plt.close("all")
import numpy as np
import pandas as pd

x = pd.date_range("2020-01-01", "2020-02-01", freq="10min")
xrange = np.arange(len(x))
y = np.sin(5/len(x) * np.pi * xrange)
ser = pd.Series(y, index=x)
fig, ax = plt.subplots(figsize=(12,6))
ax.plot(ser.index, ser, label="original data")

daymean_start = ser.groupby(pd.PeriodIndex(ser.index, freq="D")).mean()
daymean_start.index = daymean_start.index.to_timestamp()
daymean_mid = daymean_start.copy()
daymean_mid.index = daymean_mid.index + pd.Timedelta(hours=12)
ax.plot(daymean_start.index, daymean_start, label='mean start-of-interval')
ax.plot(daymean_mid.index, daymean_mid, label='mean center-of-interval')
ax.legend()

Gives:

Usecase:
The modeldata available via copernicusmarine subset/open_dataset (for instance te dataset_id cmems_mod_glo_phy_my_0.083deg_P1D-m) is averaged per day with start-of-interval timestamps (orange line). When comparing this to instantaneous model data from local hydrodynamic models, or instantaneous observations (both represented by the blue line), this shows a clear offset. If the data would be stored with center-of-interval timestamps (green line), there is no such offset

veenstrajelmer · 2025-01-29T14:34:39Z

I was working on a workaround for this on our side in the meantime, and hoped to be able to use the dataset_id string, since it contains things like "P1D-m" for daily means. However, I realized this is not always consistent, for instance the dataset_id med-cmcc-cur-rean-d here: https://data.marine.copernicus.eu/product/MEDSEA_MULTIYEAR_PHY_006_004/services
Would it be possible to harmonize the dataset_ids so that it is clear if and how the data is averaged? Metadata or center-of-interval (as requested in this issue) would be far more useful. But in the meantime it would be helpful to at least be able to get it from the dataset_id strings.

renaudjester · 2025-01-31T16:45:18Z

About this last comment, there is nothing we can do on the toolbox side 🤔

veenstrajelmer mentioned this issue Jan 15, 2025

Consider 12-hour offset for CMEMS data Deltares/dfm_tools#878

Closed

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: add support for noon-centered times to `copernicusmarine.open_dataset()` #271

Feature request: add support for noon-centered times to `copernicusmarine.open_dataset()` #271

veenstrajelmer commented Jan 15, 2025 •

edited

Loading

renaudjester commented Jan 16, 2025

veenstrajelmer commented Jan 28, 2025 •

edited

Loading

veenstrajelmer commented Jan 29, 2025 •

edited

Loading

renaudjester commented Jan 31, 2025

Feature request: add support for noon-centered times to copernicusmarine.open_dataset() #271

Feature request: add support for noon-centered times to copernicusmarine.open_dataset() #271

Comments

veenstrajelmer commented Jan 15, 2025 • edited Loading

renaudjester commented Jan 16, 2025

veenstrajelmer commented Jan 28, 2025 • edited Loading

veenstrajelmer commented Jan 29, 2025 • edited Loading

renaudjester commented Jan 31, 2025

Feature request: add support for noon-centered times to `copernicusmarine.open_dataset()` #271

Feature request: add support for noon-centered times to `copernicusmarine.open_dataset()` #271

veenstrajelmer commented Jan 15, 2025 •

edited

Loading

veenstrajelmer commented Jan 28, 2025 •

edited

Loading

veenstrajelmer commented Jan 29, 2025 •

edited

Loading