Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider 12-hour offset for CMEMS data #878

Open
5 tasks
veenstrajelmer opened this issue Jul 4, 2024 · 0 comments
Open
5 tasks

Consider 12-hour offset for CMEMS data #878

veenstrajelmer opened this issue Jul 4, 2024 · 0 comments

Comments

@veenstrajelmer
Copy link
Collaborator

veenstrajelmer commented Jul 4, 2024

copernicusmarine has Start-of-interval time samples (e.g. start of hour, day, month, year) — native datasets use a mix of start-of-interval and center-of-interval. We had midday timestamps when using opendap, now we have midnight values, but the actual data is the same. Consider to correct for this? The data is always daily mean, at least in dfmt.copernicusmarine_get_dataset_id(), so we could add an offset of 12 hours.

def copernicusmarine_get_dataset_id(varkey, date_min, date_max):
#TODO: maybe get dataset_id from 'copernicusmarine describe --include-datasets --contains <search_token>'
product = copernicusmarine_get_product(date_min, date_max)
if varkey in ['bottomT','tob','mlotst','siconc','sithick','so','thetao','uo','vo','usi','vsi','zos']: #for physchem
# resolution is 1/12 degrees in lat/lon dimension, but a bit more/less in alternating cells
if product == 'analysisforecast': #forecast: https://data.marine.copernicus.eu/product/GLOBAL_ANALYSISFORECAST_PHY_001_024/description
if varkey in ['uo','vo']: #anfc datset is splitted over multiple urls
dataset_id = 'cmems_mod_glo_phy-cur_anfc_0.083deg_P1D-m'
elif varkey in ['so']:
dataset_id = 'cmems_mod_glo_phy-so_anfc_0.083deg_P1D-m'
elif varkey in ['thetao']:
dataset_id = 'cmems_mod_glo_phy-thetao_anfc_0.083deg_P1D-m'
else:
dataset_id = 'cmems_mod_glo_phy_anfc_0.083deg_P1D-m'
else: #reanalysis: https://data.marine.copernicus.eu/product/GLOBAL_MULTIYEAR_PHY_001_030/description
dataset_id = 'cmems_mod_glo_phy_my_0.083deg_P1D-m'
elif varkey in ['nppv','o2','talk','dissic','ph','spco2','no3','po4','si','fe','chl','phyc']: # for bio
# resolution is 1/4 degrees
if product == 'analysisforecast': #forecast: https://data.marine.copernicus.eu/product/GLOBAL_ANALYSISFORECAST_BGC_001_028/description
if varkey in ['nppv','o2']:
dataset_id = 'cmems_mod_glo_bgc-bio_anfc_0.25deg_P1D-m'
elif varkey in ['talk','dissic','ph']:
dataset_id = 'cmems_mod_glo_bgc-car_anfc_0.25deg_P1D-m'
elif varkey in ['spco2']:
dataset_id = 'cmems_mod_glo_bgc-co2_anfc_0.25deg_P1D-m'
elif varkey in ['no3','po4','si','fe']:
dataset_id = 'cmems_mod_glo_bgc-nut_anfc_0.25deg_P1D-m'
elif varkey in ['chl','phyc']:
dataset_id = 'cmems_mod_glo_bgc-pft_anfc_0.25deg_P1D-m'
else: #https://data.marine.copernicus.eu/product/GLOBAL_MULTIYEAR_BGC_001_029/description
dataset_id = 'cmems_mod_glo_bgc_my_0.25_P1D-m'
else:
raise KeyError(f"unknown varkey for cmems: {varkey}")
return dataset_id

For instance by replacing three occurences of copernicusmarine.open_dataset() with the following code:

def copernicusmarine_open_dataset_12h_offset(dataset_id, **kwargs):
    assert 'P1D' in dataset_id
    ds = copernicusmarine.open_dataset(dataset_id=dataset_id, **kwargs)
    ds['time'] = ds['time'] + pd.Timedelta(hours=12)
    return ds

Minimal code for debugging:

import dfm_tools as dfmt
dfmt.download_CMEMS(varkey='zos', 
                    longitude_min=-1, longitude_max=1,
                    latitude_min=52, latitude_max=53, 
                    date_min="2021-06-29", date_max="2021-06-30")
dfmt.download_CMEMS(varkey='zos', 
                    longitude_min=-1, longitude_max=1,
                    latitude_min=52, latitude_max=53, 
                    date_min="2021-07-01", date_max="2021-07-02")

Todo:

  • add 12 hour offset upon opening dataset, see copernicusmarine_open_dataset_12h_offset code above
  • in dfmt.download_CMEMS() subset time after opening dataset with 12h-offset function, but also check the impact on performance of this change
  • re-introduce timeshift function including test from https://github.com/Deltares/dfm_tools/pull/392/files
  • beware on time administration, especially since date timestrings like "2020-06-29" are parsed to datetimes of the midnight version of that, while with normal xr.Dataset.sel() this would include the entire day. Also check if we can download end of reanalysis and start of reanalysis-interim smoothly.
  • check writing of initial fields again, still two values around the model start time?

Some usecases:

  • downloading data and interpolate to boundaries to serve as boundary conditions for models, in this case it could make sense to move the daily average to noon, since this is then representative for the entire day and in the middle.
  • when using the data as validation data for a model, it would be best to compare to daily averages of the model also. With xarray this would most probably end up at midnight also, so no timeshift is desired. When comparing to instantaneous model values, it is slightly more convenient to have the cmems data on midday, but it does not matter much and comparing a daily mean to an instantaneous value on midnight or noon is not accurate anyway.

Alternative approach
Alternatively, request a argument for copernicusmarine.open_dataset() to get averaged values in mid-time or start-time of the average. That would completely solve all complexity around this issue. Also request attributes, at the moment it is not clear in the dataset that the time is not instantaneous but averaged. Check if insitu timeseries are instantaneous and not averaged. Requested new argument and/or metadata via [email protected] on 10-7-2024, the request is registered under ticket [MDSOP-179]

For BES project?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant