Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix date issue by switching to converting UV data using location-based timezones #73

Merged
merged 9 commits into from
Sep 26, 2022

Conversation

lindsayplatt
Copy link
Contributor

Finally committing/PR-ing the code related to my changes to summarize UV data based on timezone, so that we don't end up with data from the wrong day getting averaged together. The reason we noticed this in the first place was because we kept ending up with an extra frame outside of the timeframe we specified (see #47).

@lindsayplatt
Copy link
Contributor Author

See DOI-USGS/dataRetrieval#634 (comment) for how I figured out the daylight savings issue fix.

@lindsayplatt
Copy link
Contributor Author

I've verified that this works and we no longer have dates that extend beyond the dates we requested. So, each summarized daily value is now local to each site's timezone.

library(dplyr)
readr::read_csv('2_process/out/gw_daily_quantiles.csv') %>%
  mutate(day_seq = lubridate::yday(Date)) %>%
  group_by(day_seq) %>%
  summarize(sites = length(unique(site_no))) %>%
  arrange(desc(day_seq)) 

# A tibble: 365 x 2
   day_seq sites
     <dbl> <int>
 1     365  1823
 2     364  1819
 3     363  1817
 4     362  1820
 5     361  1818
 6     360  1817
 7     359  1822
 8     358  1821
 9     357  1825
10     356  1833

@lindsayplatt
Copy link
Contributor Author

@ceenell requesting a review but I don't expect you to run this. I have re-run all this code locally and tested the new function I created via DOI-USGS/dataRetrieval#634 (comment). Just hoping to have your eyes scan this before it officially is merged.

group_by(site_no, Date) %>%
summarize(GWL = mean(GWL_inst, na.rm = TRUE)) %>%
write_feather(target_name)
}

# Convert POSIXct to Dates for a given timezone
# Only works with one tz_abbr at a time
POSIXct_to_Date_tz <- function(posix_dates, tz_abbr) {
Copy link
Member

@cnell-usgs cnell-usgs Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These next two function are also duplicated in 0_historic/src/fetch_nwis_historic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the need for doing that. I know we're keeping the historic fetch separate from the current, but seems like duplicating the same steps increases complexity and room for error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right that these are duplicate functions. My goal with duplicating was to keep the historic fetching/processing totally separate. I agree that this introduces potential issues with updating one and not the other. This is the case with a couple of other things, so I will make a separate issue to address. I want to think through how to best do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #74

@@ -30,14 +30,110 @@ fetch_gw_uv <- function(target_name, gw_sites, start_date, end_date, param_cd) {
write_feather(target_name)
}

convert_uv_to_dv <- function(target_name, gw_uv_data_fn) {
convert_uv_to_dv <- function(target_name, gw_uv_data_fn, site_tz_xwalk) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this different from fetch_nwis_historic.r?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope ... will address this in #74

Copy link
Member

@cnell-usgs cnell-usgs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great this mystery is is resolved and just wow. How many sites was this an issue for? Is this something that also needs to be addressed with gage-conditions-gif?

Without running the pipeline I cannot confirm that this works, but nothing stands out to me as an error with the syntax.

I am wondering about the duplicated functions between historic and current data pulls. It is not immediately clear to me why it's needed.

Copy link
Member

@cnell-usgs cnell-usgs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK go ahead and merge to keep things moving forward

@lindsayplatt
Copy link
Contributor Author

As for how many sites this timezone problem was impacting, I believe based on your findings here, it was almost 600 out of the ~1300 total sites.

@lindsayplatt
Copy link
Contributor Author

Lastly, you asked if this is an issue for gage-conditions-gif. My guess is yes, but we haven't noticed because in gage-conditions-gif, we set the PNG frame dates based on the supplied viz dates, rather than using the data to generate the SVG visuals as we do in this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants