Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out if we can rescue the negative depth observations #173

Open
jordansread opened this issue May 24, 2020 · 2 comments
Open

Figure out if we can rescue the negative depth observations #173

jordansread opened this issue May 24, 2020 · 2 comments

Comments

@jordansread
Copy link

jordansread commented May 24, 2020

412009 temperature records were removed because they had negative depth values.

I think this is all from coop data, since the < 0m depths are silently removed from WQP at an earlier stage. 400k+ seems like a lot to lose, and it would be good to know if these all come from one provider that we could reach out to for clarification on the fields and the meaning.

code reference here

@limnoliver
Copy link
Contributor

I know one file that has a lot of negative depths is Water_Temp.feather (originally Water_Temp.accdb). See this issue -- I investigated, and negative depths are distance from bottom, but no site metadata to support figuring out depth from surface.

@lindsayplatt
Copy link
Contributor

Was just following up on this to see where we are at now (2 years later). There are 412,218 negative temp observations at 166 unique sites across 67 sources. Water_Temp.feather is still the biggest culprit. Likely not worth our time to go after most of them, but briefly looking into Iowa DNR and MPCA data could result in ~200 additional observations.

library(scipiper)
library(tidyverse)

wqp_in <- sc_retrieve('7a_wqp_munge/out/temp_wqp_munged_linked.feather.ind')
wqp_dat <- feather::read_feather(wqp_in)

coop_in <- sc_retrieve('7a_temp_coop_munge/out/all_coop_dat_linked.feather.ind')
coop_dat <- feather::read_feather(coop_in)

f_all_dat <- dplyr::select(wqp_dat, date = Date, time, timezone, depth, temp = wtemp, site_id = id, source_id = MonitoringLocationIdentifier, source_site_id = MonitoringLocationIdentifier) %>%
  mutate(source = sprintf('wqp_%s', source_id)) %>%
  bind_rows(dplyr::select(coop_dat, date = DateTime, time,
                          timezone, depth, temp, site_id, source_id = state_id,
                          source_site_id = site, source)) %>%
  mutate(month = lubridate::month(date)) %>%
  filter(!(month %in% c(1, 2) & temp > 10)) %>%
  filter(!(month %in% c(7, 8) & depth < 0.5 & temp < 10)) %>%
  mutate(timezone = ifelse(is.na(time), NA, timezone))

recent_f_dat_neg <- recent_f_dat %>% filter(depth < 0)
nrow(recent_f_dat_neg)
[1] 412218

recent_f_dat_neg %>% 
  group_by(site_id, source) %>% 
  summarize(n = n()) %>% 
  arrange(desc(n))
# A tibble: 166 x 3
# Groups:   site_id [104]
   site_id         source                                                         n
   <chr>           <chr>                                                      <int>
 1 nhdhr_32671150  7a_temp_coop_munge/tmp/Water_Temp.rds                     205839
 2 nhdhr_58125241  7a_temp_coop_munge/tmp/Water_Temp.rds                     137455
 3 nhdhr_120020307 7a_temp_coop_munge/tmp/Water_Temp.rds                      46365
 4 nhdhr_32672122  7a_temp_coop_munge/tmp/Water_Temp.rds                      13820
 5 nhdhr_120018008 7a_temp_coop_munge/tmp/Water_Temp.rds                       8425
 6 nhdhr_152517574 wqp_GNLK01_WQX-INGS                                           12
 7 nhdhr_132544104 7a_temp_coop_munge/tmp/Iowa_DNR_LimnoProfiles_2000_2020.~     11
 8 nhdhr_133551903 7a_temp_coop_munge/tmp/Iowa_DNR_LimnoProfiles_2000_2020.~      8
 9 nhdhr_137044605 7a_temp_coop_munge/tmp/Iowa_DNR_LimnoProfiles_2000_2020.~      8
10 nhdhr_60090166  wqp_LRBOI_WQX-TMan                                             7
# ... with 156 more rows

recent_f_dat_neg %>% 
  group_by(source) %>% 
  summarize(n = n()) %>% 
  arrange(desc(n))
# A tibble: 67 x 2
   source                                                                         n
   <chr>                                                                      <int>
 1 7a_temp_coop_munge/tmp/Water_Temp.rds                                     411904
 2 7a_temp_coop_munge/tmp/Iowa_DNR_LimnoProfiles_2000_2020.rds                  129
 3 7a_temp_coop_munge/tmp/1945_2020_All_MNDNR_MPCA_Temp_DO_Profiles.rds          38
 4 7a_temp_coop_munge/tmp/MPCA_temp_data_all.rds                                 36
 5 wqp_GNLK01_WQX-INGS                                                           12
 6 wqp_LRBOI_WQX-TMan                                                             7
 7 7a_temp_coop_munge/tmp/DNRdatarequest_Secchi_DO_and_Temp_1083_2016_AllLa~      5
 8 wqp_LRBOI_WQX-TPine                                                            5
 9 wqp_SRSTEPA-SW-BD-2017-01                                                      5
10 wqp_SRSTEPA-SW-FD-2017-01                                                      4
# ... with 57 more rows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants