Improve CMEMS download performance #1033

veenstrajelmer · 2024-10-23T15:08:37Z

Downloading long timeseries for CMEMS is slow with dfm_tools, even though the actual download happens with a daily frequency. This is probably since per default the entire requested dataset is opened, from which then daily subsets are retrieved:

dfm_tools/dfm_tools/download.py

Lines 216 to 249 in f7e5234

    
           dataset = copernicusmarine.open_dataset( 
        
                dataset_id = dataset_id, 
        
                variables = [varkey], 
        
                minimum_longitude = longitude_min, 
        
                maximum_longitude = longitude_max, 
        
                minimum_latitude = latitude_min, 
        
                maximum_latitude = latitude_max, 
        
                start_datetime = date_min, 
        
                end_datetime = date_max, 
        
           ) 
        
           Path(dir_output).mkdir(parents=True, exist_ok=True) 
        
           if freq is None: 
        
               date_str = f"{date_min.strftime('%Y%m%d')}_{date_max.strftime('%Y%m%d')}" 
        
               name_output = f'{file_prefix}{varkey}_{date_str}.nc' 
        
               output_filename = Path(dir_output,name_output) 
        
               if output_filename.is_file() and not overwrite: 
        
                   print(f'"{name_output}" found and overwrite=False, returning.') 
        
                   return 
        
               print(f'xarray writing netcdf file: {name_output}') 
        
               dataset.to_netcdf(output_filename) 
        
           else: 
        
               period_range = pd.period_range(date_min,date_max,freq=freq) 
        
               for date in period_range: 
        
                   date_str = str(date) 
        
                   name_output = f'{file_prefix}{varkey}_{date_str}.nc' 
        
                   output_filename = Path(dir_output,name_output) 
        
                   if output_filename.is_file() and not overwrite: 
        
                       print(f'"{name_output}" found and overwrite=False, continuing.') 
        
                       continue 
        
                   dataset_perperiod = dataset.sel(time=slice(date_str, date_str)) 
        
                   print(f'xarray writing netcdf file: {name_output}') 
        
                   dataset_perperiod.to_netcdf(output_filename)

This example shows that when cutting it up in monthly chunks, the download is way faster compared to retrieving at once:

import dfm_tools as dfmt
import pandas as pd

# spatial extents
lon_min, lon_max, lat_min, lat_max = 12.5, 16.5, 34.5, 37

# time extents
date_min = '2017-12-01'
date_max = '2022-07-31'

# make list of start/stop times (tuples) with monthly frequency
# TODO: this approach improves performance significantly
date_range_start = pd.date_range(start=date_min, end=date_max, freq='MS')
date_range_end = pd.date_range(start=date_min, end=date_max, freq='ME')
monthly_periods = [(start, end) for start, end in zip(date_range_start, date_range_end)]

# make list of start/stop times (tuples) to download all at once (but still per day)
# TODO: this is the default behaviour and is slow
monthly_periods = [(date_min, date_max)]

for period in monthly_periods: 
    dfmt.download_CMEMS(varkey='uo',
                        longitude_min=lon_min, longitude_max=lon_max, latitude_min=lat_min, latitude_max=lat_max,
                        date_min=period[0], date_max=period[1],
                        dir_output=".", overwrite=True, dataset_id='med-cmcc-cur-rean-d')

Todo:

first await copernicusmarine v2 release and update to copernicusmarine toolbox 2.0 after release #933
check if there is still a difference in performance between the two methods

This was referenced Oct 23, 2024

Prepare 0.31.0 release #1031

Closed

Prepare 0.32.0 release #1036

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve CMEMS download performance #1033

Improve CMEMS download performance #1033

veenstrajelmer commented Oct 23, 2024 •

edited

Loading

Improve CMEMS download performance #1033

Improve CMEMS download performance #1033

Comments

veenstrajelmer commented Oct 23, 2024 • edited Loading

veenstrajelmer commented Oct 23, 2024 •

edited

Loading