Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weekend calculation bug in weather data pipeline #10

Open
vengroff opened this issue Jul 18, 2024 · 1 comment
Open

Weekend calculation bug in weather data pipeline #10

vengroff opened this issue Jul 18, 2024 · 1 comment
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@vengroff
Copy link
Contributor

Summarizing my current understanding:

  • raw data with fully-qualified dates across many years exists in gs://the-cube/data/raw/nrel/end_use_load_profiles/2022/resstock_tmy3_release_1/weather/state=/_TMY3.csv
  • The extract_data.py script loads that data, and
    • computes weekday based on the original year of the raw data
    • drops the year from the date without doing any shifting to account for different years starting on different days of the week
    • writes the results to ml.surrogate_model.weather_features_hourly.

extract_data.py needs to be enhanced to

  • Validate the density and completeness of coverage of the input date.
  • Do the shift first based on the weekday offset of 01-01-YYYY vs 01-01-2006 (essentially move a fixed number of weeks that is approximately but not exactly +/- 52 * (YYYY - 2006).
  • Handle wrapping cases at the very start/end of the year.
  • Validate that day of week and weekend flags are exactly the same for both the original unshifted data and the shifted data.
  • Do not drop the year from the output. Instead assert that it should always be 2006 after the shift.

This style, where there are more validation steps than there are transformation steps, is what I generally strive for. It gives me a lot more confidence in the data I'm going to train my model on.

@vengroff vengroff added bug Something isn't working good first issue Good for newcomers labels Jul 18, 2024
@vengroff vengroff self-assigned this Jul 18, 2024
@vengroff
Copy link
Contributor Author

@mikivee FYI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant