-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix date issue by switching to converting UV data using location-based timezones #73
Conversation
… daylight savings appropriately
See DOI-USGS/dataRetrieval#634 (comment) for how I figured out the daylight savings issue fix. |
I've verified that this works and we no longer have dates that extend beyond the dates we requested. So, each summarized daily value is now local to each site's timezone. library(dplyr)
readr::read_csv('2_process/out/gw_daily_quantiles.csv') %>%
mutate(day_seq = lubridate::yday(Date)) %>%
group_by(day_seq) %>%
summarize(sites = length(unique(site_no))) %>%
arrange(desc(day_seq))
# A tibble: 365 x 2
day_seq sites
<dbl> <int>
1 365 1823
2 364 1819
3 363 1817
4 362 1820
5 361 1818
6 360 1817
7 359 1822
8 358 1821
9 357 1825
10 356 1833 |
@ceenell requesting a review but I don't expect you to run this. I have re-run all this code locally and tested the new function I created via DOI-USGS/dataRetrieval#634 (comment). Just hoping to have your eyes scan this before it officially is merged. |
group_by(site_no, Date) %>% | ||
summarize(GWL = mean(GWL_inst, na.rm = TRUE)) %>% | ||
write_feather(target_name) | ||
} | ||
|
||
# Convert POSIXct to Dates for a given timezone | ||
# Only works with one tz_abbr at a time | ||
POSIXct_to_Date_tz <- function(posix_dates, tz_abbr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These next two function are also duplicated in 0_historic/src/fetch_nwis_historic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the need for doing that. I know we're keeping the historic fetch separate from the current, but seems like duplicating the same steps increases complexity and room for error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are absolutely right that these are duplicate functions. My goal with duplicating was to keep the historic fetching/processing totally separate. I agree that this introduces potential issues with updating one and not the other. This is the case with a couple of other things, so I will make a separate issue to address. I want to think through how to best do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #74
@@ -30,14 +30,110 @@ fetch_gw_uv <- function(target_name, gw_sites, start_date, end_date, param_cd) { | |||
write_feather(target_name) | |||
} | |||
|
|||
convert_uv_to_dv <- function(target_name, gw_uv_data_fn) { | |||
convert_uv_to_dv <- function(target_name, gw_uv_data_fn, site_tz_xwalk) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this different from fetch_nwis_historic.r
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope ... will address this in #74
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great this mystery is is resolved and just wow. How many sites was this an issue for? Is this something that also needs to be addressed with gage-conditions-gif
?
Without running the pipeline I cannot confirm that this works, but nothing stands out to me as an error with the syntax.
I am wondering about the duplicated functions between historic and current data pulls. It is not immediately clear to me why it's needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK go ahead and merge to keep things moving forward
As for how many sites this timezone problem was impacting, I believe based on your findings here, it was almost 600 out of the ~1300 total sites. |
Lastly, you asked if this is an issue for gage-conditions-gif. My guess is yes, but we haven't noticed because in |
Finally committing/PR-ing the code related to my changes to summarize UV data based on timezone, so that we don't end up with data from the wrong day getting averaged together. The reason we noticed this in the first place was because we kept ending up with an extra frame outside of the timeframe we specified (see #47).