Skip to content

Commit

Permalink
feat: update data override yaml to read external weight files (#1556)
Browse files Browse the repository at this point in the history
Co-authored-by: Andrew Brooks <[email protected]>
  • Loading branch information
uramirez8707 and abrooks1085 authored Aug 2, 2024
1 parent 85611f3 commit 33a87c6
Show file tree
Hide file tree
Showing 16 changed files with 606 additions and 124 deletions.
150 changes: 104 additions & 46 deletions data_override/README.MD
Original file line number Diff line number Diff line change
Expand Up @@ -7,29 +7,29 @@
- [How to use it?](README.MD#2-how-to-use-it)
- [Converting legacy data_table to data_table.yaml](README.MD#3-converting-legacy-data_table-to-data_tableyaml)
- [Examples](README.MD#4-examples)
- [External Weight File Structure](README.MD#5-external-weight-file-structure)

#### 1. YAML Data Table format:
Each entry in the data_table has the following key values:
- **gridname:** Name of the grid to interpolate the data to. The acceptable values are "ICE", "OCN", "ATM", and "LND"
- **fieldname_code:** Name of the field as it is in the code to interpolate.
- **fieldname_file:** Name of the field as it is writen in the file. **Required** only if overriding from a file
- **file_name:** Name of the file where the variable is located, including the directory. **Required** only if overriding from a file
- **interpol_method:** Method used to interpolate the field. The acceptable values are "bilinear", "bicubic", and "none". "none" implies that the field in the file is already in the model grid. The LIMA format is no longer supported. **Required** only if overriding from a file
- **grid_name:** Name of the grid to interpolate the data to. The acceptable values are "ICE", "OCN", "ATM", and "LND"
- **fieldname_in_model:** Name of the field as it is in the code to interpolate.
- **override_file:** Optional subsection with key/value pairs defining how to override from a netcdf file.
- **file_name:** Name of the file where the variable is located, including the directory
- **fieldname_in_file:** Name of the field as it is writen in the file
- **interp_method:** Method used to interpolate the field. The acceptable values are "bilinear", "bicubic", and "none". "none" implies that the field in the file is already in the model grid. The LIMA format is no longer supported
- **multi_file:** Optional subsection with key/value pairs to use multiple(3) input netcdf files instead of 1. Note that **file_name** must be the second file in the set when using multiple input netcdf files
- **prev_file_name:** The name of the first file in the set
- **next_file_name:** The name of the third file in the set
- **external_weights:** Optional subsection with key/value pairs defining the external weights file to used for the interpolation.
- **file_name:** Name of the file where the external weights are located, including the directory
- **source:** Name of the source that generated the external weights. The only acceptable value is "fregrid"
- **factor:** Factor that will be multiplied after the data is interpolated

If it is desired to interpolate the data to a region of the model grid. The following **optional** arguments are available.
- **region_type:** The region type. The acceptable values are "inside_region" and "outside_region"
- **lon_start:** The starting latitude in the same units as the grid data in the file
- **lon_end:** The ending latitude in the same units as the grid data in the file
- **lat_start:** The starting longitude in the same units as the grid data in the file
- **lon_end:** The ending longitude in the same units as the grid data in the file

If it is desired to use multiple(3) input netcdf files instead of 1. The following **optional** keys are available.
- **is_multi_file:** Set to `True` is using the multi-file feature
- **prev_file_name:** The name of the first file in the set
- **next_file_name:** The name of the third file in the set

Note that **file_name** must be the second file in the set. **prev_file_name** and/or **next_file_name** are required if **is_multi_file** is set to `True`
- **subregion:** Optional subsection with key/value pairs that define a subregion of the model grid to interpolate the data to.
- **type:** The region type. The acceptable values are "inside_region" and "outside_region"
- **lon_start:** The starting latitude in the same units as the grid data in the file
- **lon_end:** The ending latitude in the same units as the grid data in the file
- **lat_start:** The starting longitude in the same units as the grid data in the file
- **lon_end:** The ending longitude in the same units as the grid data in the file

#### 2. How to use it?
In order to use the yaml data format, [libyaml](https://github.com/yaml/libyaml) needs to be installed and linked with FMS. Additionally, FMS must be compiled with -Duse_yaml macro. If using autotools, you can add `--with-yaml`, which will add the macro for you and check that libyaml is linked correctly.
Expand All @@ -55,21 +55,22 @@ In the **legacy format**, the data_table will look like:
In the **yaml format**, the data_table will look like
```
data_table:
- gridname : ICE
fieldname_code : sic_obs
fieldname_file : sic
file_name : INPUT/hadisst_ice.data.nc
interpol_method : bilinear
factor : 0.01
- grid_name : ICE
fieldname_in_model : sic_obs
override_file:
- file_name : INPUT/hadisst_ice.data.nc
fieldname_in_file : sic
interp_method : bilinear
factor : 0.01
```
Which corresponds to the following model code:
```F90
call data_override('ICE', 'sic_obs', icec, Spec_Time)
```
where:
- `ICE` corresponds to the gridname in the data_table
- `sic_obs` corresponds to the fieldname_code in the data_table
- `icec` is the variable to write the data to
- `ICE` is the component domain for which the variable is being interpolated and corresponds to the grid_name in the data_table
- `sic_obs` corresponds to the fieldname_in_model in the data_table
- `icec` is the storage array that holds the interpolated data
- `Spec_Time` is the time to interpolate the data to.

Additionally, it is required to call data_override_init (in this case with the ICE domain). The grid_spec.nc file must also contain the coordinate information for the domain being used.
Expand All @@ -82,25 +83,25 @@ call data_override_init(Ice_domain_in=Ice_domain)

In the **legacy format**, the data_table will look like:
```
"ICE", "sit_obs", "", "INPUT/hadisst_ice.data.nc", "none", 2.0
"ICE", "sit_obs", "", "INPUT/hadisst_ice.data.nc", "none", 2.0
```

In the **yaml format**, the data_table will look like:
```
``` yaml
data_table:
- gridname : ICE
fieldname_code : sit_obs
factor : 0.01
- grid_name : ICE
fieldname_in_model : sit_obs
factor : 0.01
```
Which corresponds to the following model code:
```F90
call data_override('ICE', 'sit_obs', icec, Spec_Time)
```
where:
- `ICE` corresponds to the gridname in the data_table
- `sit_obs` corresponds to the fieldname_code in the data_table
- `icec` is the variable to write the data to
- `ICE` is the component domain for which the variable is being interpolated and corresponds to the grid_name in the data_table
- `sit_obs` corresponds to the fieldname_in_model in the data_table
- `icec` is the storage array that holds the interpolated data
- `Spec_Time` is the time to interpolate the data to.

Additionally, it is required to call data_override_init (in this case with the ICE domain). The grid_spec.nc file is still required to initialize data_override with the ICE domain.
Expand All @@ -117,28 +118,85 @@ In the **legacy format**, the data_table will look like:
```

In the **yaml format**, the data_table will look like:
```
``` yaml
data_table:
- gridname : OCN
fieldname_code : runoff
fieldname_file : runoff
file_name : INPUT/runoff.daitren.clim.nc
interpol_method : none
factor : 1.0
- grid_name : OCN
fieldname_in_model : runoff
override_file:
- file_name : INPUT/runoff.daitren.clim.nc
fieldname_in_file : runoff
interp_method : none
factor : 1.0
```
Which corresponds to the following model code:
```F90
call data_override('OCN', 'runoff', runoff_data, Spec_Time)
```
where:
- `OCN` corresponds to the gridname in the data_table
- `runoff` corresponds to the fieldname_code in the data_table
- `runoff_data` is the variable to write the data to
- `OCN` is the component domain for which the variable is being interpolated and corresponds to the grid_name in the data_table
- `runoff` corresponds to the fieldname_in_model in the data_table
- `runoff_data` is the storage array that holds the interpolated data
- `Spec_Time` is the time to interpolate the data to.

Additionally, it is required to call data_override_init (in this case with the ocean domain). The grid_spec.nc file is still required to initialize data_override with the ocean domain and to determine if the data in the file is in the same grid as the ocean.

```F90
call data_override_init(Ocn_domain_in=Ocn_domain)
```

**4.4** The following example uses the multi-file capability
``` yaml
data_table:
- grid_name : ICE
fieldname_in_model : sic_obs
override_file:
- file_name : INPUT/hadisst_ice.data_yr1.nc
fieldname_in_file : sic
interp_method : bilinear
multi_file:
- next_file_name: INPUT/hadisst_ice.data_yr2.nc
prev_file_name: INPUT/hadisst_ice.data_yr0.nc
factor : 0.01
```
Data override determines which file to use depending on the model time. This is to prevent having to combine the 3 yearly files into one, since the end of the previous file and the beginning of the next file are needed for yearly simulations.
**4.5** The following example uses the external weight file capability
``` yaml
data_table:
- grid_name : ICE
fieldname_in_model : sic_obs
override_file:
- file_name : INPUT/hadisst_ice.data.nc
fieldname_in_file : sic
interp_method : bilinear
external_weights:
- file_name: INPUT/remamp_file.nc
source: fregrid
factor : 0.01
```
#### 5. External Weight File Structure
**5.1** Bilinear weight file example from fregrid
```
dimensions:
nlon = 5 ;
nlat = 6 ;
three = 3 ;
four = 4 ;
variables:
int index(three, nlat, nlon) ;
double weight(four, nlat, nlon) ;
```
- `nlon` and `nlat` must be equal to the size of the global domain.
- `index(1,:,:)` corresponds to the index (i) of the longitudes point in the data file, closest to each model lon, lat
- `index(2,:,:)` corresponds to the index (j) of the lattidude point in the data file, closest to each model lon, lat
- `index(3,:,:)` corresponds to the tile (it should be 1 since data_override does not support interpolation **from** cubesphere grids)
- From there the four corners are (i,j), (i,j+1) (i+1) (i+1,j+1)
- The weights for the four corners
- weight(:,:,1) -> (i,j)
- weight(:,:,2) -> (i,j+1)
- weight(:,:,3) -> (i+1,j)
- weight(:,:,4) -> (i+1,j+1)
Loading

0 comments on commit 33a87c6

Please sign in to comment.