Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the raob model pressure level builder v01 review docs #444

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
53 changes: 53 additions & 0 deletions docs/reviews/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Architecture overview

## Meeting

We had an online architecture review meeting on October 31. The general consensus was that the architecture is acceptable.

## Overview

The architecture plan is to extend the VxIngest GribBuilder to create a GribModelRaobPressureBuilderV01 class that will handle the pressure model files. It is intended to read these files from the NODD using the BOTO3 python package. Example [hrrr.t00z.wrfprsf00.grib2](https://noaa-hrrr-bdp-pds.s3.amazonaws.com/index.html#hrrr.20240731/conus/hrrr.t00z.wrfprsf00.grib2) is the operational hrrr model grib2 output file with pressure levels. Using AWS cli it would be "aws s3 cp --no-sign-request s3://noaa-hrrr-bdp-pds/hrrr.20240731/conus/hrrr.t00z.wrfprsf00.grib2 /opt/data/grib2_to_cb/hrrr_ops/input_files/2421300000000" to download the test data file of July31, 2024 00Z.

## Templates

There are associated ingest templates that will define the data types, "MD:V01:RAOB:PRS:HRRR_OPS:ingest:grib2" and "MD:V01:RAOB:PRS:HRRR_OPS:ingest:grib2". These are straightforward grib2 ingest templates. There will be a data document for each fcst hour and each level with entries for every RAOB station. We will record drift info in the data section.

## Data Source

The builder will use cfgrib to read the temporary files, then clean them up after. There appears to be no well defined way to stream the file directly from aws s3 so the program will download it completely. The primary isobaric dataset is retrieved by ds=xr.open_dataset(f,engine="cfgrib",backend_kwargs={"filter_by_keys": {"typeOfLevel":"isobaricInhPa"}}) which will contain the variables we need, i.e. temp, height, dp, sh, etc. The pressures in the grib2 file are spaced every 25 mb from 1013mb through 50mb so the ingest will need to interpolate the variables to standard levels (1010 through 20 spaced by 10).

## Method

Variables can be retrieved in python by first opening the file with xarray (with the engine cfgrib), then accessing the variable values for a given step and matching the pressure at that step. i.e.

```bash
# cd to the clone dir for VxIngest
> cd $HOME/VxIngest
# source the virtual env
> . .venv/bin/activate
# start python3
> python
>>> # download the file see .... https://github.com/awsdocs/aws-doc-sdk-examples/blob/main/python/example_code/s3/s3_basics/object_wrapper.py
>>> f="temp_grib2_file"
>>> ds=xr.open_dataset(f,engine="cfgrib",backend_kwargs={"filter_by_keys": {"typeOfLevel":"isobaricInhPa","read_keys": ["projString"]}})
# get the shape of the temperature variable
>>> ds.t.values.shape
(40, 1059, 1799). # 40 levels 1059 lat grid 1799 lon - this is conus
>>> list(ds.keys())
['gh', 't', 'r', 'dpt', 'q', 'w', 'u', 'v', 'absv', 'clwmr', 'unknown', 'rwmr', 'snmr', 'grle']

# get the pressure values (this is a coordinate)
>>> ds.coords['isobaricInhPa'].values
array([1013., 1000., 975., 950., 925., 900., 875., 850., 825.,
800., 775., 750., 725., 700., 675., 650., 625., 600.,
575., 550., 525., 500., 475., 450., 425., 400., 375.,
350., 325., 300., 275., 250., 225., 200., 175., 150.,
125., 100., 75., 50.])
# you find the pressure of interest and get its index..... for example 800mb is index 9, then use a gridpoint to the variable value
>>>ds.t[9,100,100].values
array(289.98505, dtype=float32). # this is in kelvin
>>> ds.t[9,100,100].values * 9 / 5 - 459.67
1 np.float32(62.30307). # this is in fahrenheit
```

The builder will maintain a map of the data variables that the translate_template_item can use to access the data. Of course the above example does not consider interpolation. The program will interpolate all the values to mandatory levels.
31 changes: 31 additions & 0 deletions docs/reviews/data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Data Requirements

## Data Source

The data will come from s3://noaa-hrrr-bdp-pds/hrrr.20240731/conus/ in form of hrrr.t01z.wrfprsf04.grib2 files.
The 'wrfprs' part tells us it is a pressure file. A native step file has 'wrfnat' instead.
The ingest will need to process these on an ongoing 'operational' means, i.e. process every file that arrives there.
The ingest will also need an on-demand way where it is given a date range of required data.
The data_request document is "DR:continuous:HRRR_OPS:1730496755:0:1730498583:V01"

Since this data comes from the public [NODD](https://www.noaa.gov/information-technology/open-data-dissemination)
it does not need to be moved to a GSL s3 bucket, it will be read directly from the data source.

The file path "s3://noaa-hrrr-bdp-pds/hrrr.20240731/conus/" includes a date component. In this example the "20240731" represents July 31, 2024. The file name "hrrr.t01z.wrfprsf04.grib2" contains a cycle time "t01z" (the operational hrrr runs every hour) and a forecast hour "f04" (the operational hrrr is recorded here to forecast hour 15).

## Data Output

The builder will produce a data bundle each run that will include the required variables for all of the unprocessed model data for which there are grib2 files available that are newer than the latest model data currently in the database. The builder is event triggered by the creation of a new model file but unprocessed data files should also be processed within limits. This can be done by querying the load job documents for the latest document processed. There will also be a way to provide parameters that will specify a range of epochs to process even though that data is older than the latest data in the database. It is a little unclear how that processing will be triggered, probably manually.

## Database Import

The data bundle will be imported according to the use cases (specifically UC 03-01)
that are currently being discussed in the data bundle meetings.

## Data Bundle storage

Long term data bundle storage is currently being discussed in the data bundle meetings. (UC-02-01). The intent is to use AWS storage classes to manage bundle lifecycle.

## Data expiration

This data will have a very long TTL (Time To Live). It is currently being discussed how to specify the TTL long lived operational data. I expect that for Couchbase data the TTL will be specified in the process_Spec but this is still under discussion.
24 changes: 24 additions & 0 deletions docs/reviews/deployment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Deployment

This might be one of the first builders to use the mechanisms being ironed out in the data bundle meetings.

The actual builders are all part of the same container already, and will be available as soon as the builder PR is merged into the main branch.

We will run this in the cloud using Ian's kubernetes deployment.

## Data_request

There is an associated Data_Source object: "DS:continuous:RAOB:HRRR_OPS:1730496755:0:1730498583:V01"

## Process_spec

There is an associated Process_Spec:
"PS:RAOB:GRIB2:MODEL:HRRR_OPS:1730496755:1814400:V01"

## Ingest docs

There is an ingest doc for the pressure level ingest:
"MD:V01:RAOB:PRS:HRRR_OPS:ingest:grib2"

There is an ingest doc for the native level ingest:
"MD:V01:RAOB:NTV:HRRR_OPS:ingest:grib2"
13 changes: 13 additions & 0 deletions docs/reviews/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# GribModelRaobPressureBuilderV01 Design

## Builder Class

GribModelRaobPressureBuilderV01 will extend GribBuilder.
GribModelRaobNativeBuilderV01 will extend GribBuilder.
GribModelRaobPressureBuilderV01 will build documents that are indexed on pressure levels of the model.
GribModelRaobNativeBuilderV01 will build documents that are indexed on native step levels of the model.

This also necessitates renaming (and slightly refactoring) the original GribBuilder class for METARS since it was the only grib model builder that existed.

The hierarchy of the classes needs to be sorted out and common code moved to the parent GribBuilder.
There will be three concrete GribBuilder classes after this.
31 changes: 31 additions & 0 deletions docs/reviews/test.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Testing

The tests will be a combination of unit and integration tests. The unit tests should cover the handlers and utility methods.
The integration tests will be patterned after the existing GribBuilder integration tests.
The test data file will be the 00Z July31, 20024 Grib File.
For the two thread integration test we will use 01Z July 31, 2024. In order to make the tests run independently of the NODD these files will be downloaded and placed in the [opt-data.gz](https://drive.google.com/file/d/1VWXoUEc0Lx5aXrtBfMK1yV5gF4iiG6H3/view?usp=drive_link) file.

## Unit tests

These will focus on testing class methods, and specific queries.

## Integration tests

These will run the builder with test data completely and then
compare output files (without importing any output) to expected outputs.
Expected output data might actually be in the database having been validated and imported manually.

## Test data

The necessary test data files will be found in the [opt-data.gz](https://drive.google.com/file/d/1VWXoUEc0Lx5aXrtBfMK1yV5gF4iiG6H3/view?usp=drive_link) file.

For this test suite the test will July 31 00Z July31, 20024 and July 31 01Z July31, 20024 Grib File(s).
[hrrr](https://noaa-hrrr-bdp-pds.s3.amazonaws.com/index.html#hrrr.20140731/conus/hrrr.t00z.wrfprsfHH.grib2)
and
[hrrr](https://noaa-hrrr-bdp-pds.s3.amazonaws.com/index.html#hrrr.20140731/conus/hrrr.t01z.wrfprsfHH.grib2)
for pressure level data files,
and
[hrrr](https://noaa-hrrr-bdp-pds.s3.amazonaws.com/index.html#hrrr.20140731/conus/hrrr.t00z.wrfnatfHH.grib2)
and
[hrrr](https://noaa-hrrr-bdp-pds.s3.amazonaws.com/index.html#hrrr.20140731/conus/hrrr.t01z.wrfnatfHH.grib2)
for native model step level data files.