You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
raw data with fully-qualified dates across many years exists in gs://the-cube/data/raw/nrel/end_use_load_profiles/2022/resstock_tmy3_release_1/weather/state=/_TMY3.csv
The extract_data.py script loads that data, and
computes weekday based on the original year of the raw data
drops the year from the date without doing any shifting to account for different years starting on different days of the week
writes the results to ml.surrogate_model.weather_features_hourly.
extract_data.py needs to be enhanced to
Validate the density and completeness of coverage of the input date.
Do the shift first based on the weekday offset of 01-01-YYYY vs 01-01-2006 (essentially move a fixed number of weeks that is approximately but not exactly +/- 52 * (YYYY - 2006).
Handle wrapping cases at the very start/end of the year.
Validate that day of week and weekend flags are exactly the same for both the original unshifted data and the shifted data.
Do not drop the year from the output. Instead assert that it should always be 2006 after the shift.
This style, where there are more validation steps than there are transformation steps, is what I generally strive for. It gives me a lot more confidence in the data I'm going to train my model on.
The text was updated successfully, but these errors were encountered:
Summarizing my current understanding:
extract_data.py
needs to be enhanced toThis style, where there are more validation steps than there are transformation steps, is what I generally strive for. It gives me a lot more confidence in the data I'm going to train my model on.
The text was updated successfully, but these errors were encountered: