update PBMTL,PGTML,PB0(bads),resample obs #8

jordansread · 2020-06-25T16:44:31Z

fixes #5 some of #4

jordansread · 2020-06-25T16:45:11Z

aappling-usgs

Added some thoughts on potential fragilities, but I didn't spot anything that I think would be creating real problems right now.

aappling-usgs · 2020-06-25T18:54:21Z

src/file_utils.R

+          },
+          depth_diff = abs(depth - new_depth)) %>%
+        # after approx(), trash any values at new_depth >= 0.5 m from the nearest observation
+        filter(depth_diff < 0.5) %>%


What do you think about replacing 0.5 with sample_res? Won't matter for this project, but in case we ever transfer this code to another project where sample_res is a higher value...

aappling-usgs · 2020-06-25T19:05:31Z

src/file_utils.R

+        filter(depth_diff < 0.5) %>%
+        # only keep one estimate for each new_depth
+        group_by(new_depth) %>%
+        filter(depth_diff == min(depth_diff)) %>%


Is there any chance of ending up with two observations with the same new_depth and depth_diff? Lines 190-191 would weed out identical original depths but wouldn't catch it if those two observations were at, say, 0.4 and 0.6m. If there were such duplicates, I think you probably would notice when writing out the observations in the date x depth matrix format, at least if using pivot_wider...but anyway, even recognizing all that, it might still be a good idea to ensure we get a single unique value here. You could do it by adding a call to slice(1) right after this filter call and before the ungroup, I think.

PS - here's why I think you'd notice when using pivot_wider: it puts a list in each cell if there are any duplicates and you don't specify values_fn:

> tibble(depth=c(1,2,3,3,3,4,4,5), temp=1:8) %>% pivot_wider(names_from=depth, values_from=temp) # A tibble: 1 x 5 `1` `2` `3` `4` `5` <list> <list> <list> <list> <list> 1 <int [1]> <int [1]> <int [3]> <int [2]> <int [1]> Warning message: Values are not uniquely identified; output will contain list-cols. * Use `values_fn = list` to suppress this warning. * Use `values_fn = length` to identify where the duplicates arise * Use `values_fn = {summary_fun}` to summarise duplicates > tibble(depth=c(1,2,3,3,3,4,4,5), temp=1:8) %>% pivot_wider(names_from=depth, values_from=temp, values_fn=mean) # A tibble: 1 x 5 `1` `2` `3` `4` `5` <dbl> <dbl> <dbl> <dbl> <dbl> 1 1 2 4 6.5 8

aappling-usgs · 2020-06-25T19:10:02Z

src/file_utils.R

+      feather::read_feather(these_files$source_filepath[i]) %>% rename(depth = index) %>% 
+        pivot_longer(-depth, names_to = 'date', values_to = 'temp') %>% 
+        mutate(date = as.Date(date)) %>% 
+        pivot_wider(names_from = depth, values_from = temp, names_prefix = 'temp_') %>% 


Here's the pivot_wider I was imagining above. So I do think you would have caught it already if any duplicate site-date-new_depth combos were occurring, because write_csv for a tibble with list columns gives an error:

> tibble(depth=c(1,2,3,3,3,4,4,5), temp=1:8) %>% pivot_wider(names_from=depth, values_from=temp, values_fn=mean) %>% write_csv('test.csv') ## ^works > tibble(depth=c(1,2,3,3,3,4,4,5), temp=1:8) %>% pivot_wider(names_from=depth, values_from=temp) %>% write_csv('test.csv') Error in stream_delim_(df, path, ..., bom = bom, quote_escape = quote_escape) : Don't know how to handle vector of type list. In addition: Warning message: Values are not uniquely identified; output will contain list-cols. * Use `values_fn = list` to suppress this warning. * Use `values_fn = length` to identify where the duplicates arise * Use `values_fn = {summary_fun}` to summarise duplicates

But note that even with the error, a file (just the column names) does get written...

Ouch! empty file when we have an error? that's not good.

Yes, I will add this other code. Your original code does this with a summarize, which I realized after the fact...thought it was redundant with what was done above, but clearly it isn't (e.g., the case where 0.4 and 0.6 exist)

jordansread · 2020-06-25T19:24:19Z

src/file_utils.R

+    # to see my work as columns, print out the result up to this point, e.g. by uncommenting
+    # tail(20) %>% print(n=20)
+    # now we clean up the columns
+    select(site_id, date, depth=new_depth, temp=new_temp, source) %>%


@aappling-usgs how about adding this here to deal w/ the duplication issue?

group_by(site_id, date, depth) %>% summarize(temp = first(temp), source = first(source)) %>% ungroup() %>% saveRDS(file = outfile)

Good call - I forgot we were grouping by source above, so this deduplication does indeed need to happen outside of that do(). Does it make sense to prefer a specific source, e.g., the one with the most observations, instead of just picking the first one?

:) it may in the future, but I think this is such an edge-case that I don't want to try to code it in for this experiment. Also, some high-obs sources are worse in quality than ones with fewer...so it may not return the "best" data afterall (?) As long as I am tracking the source used all the way into the model_matched_obs data (which we are), I am ok making this simplification.

Sounds totally reasonable.

Jordan S Read added 8 commits June 11, 2020 06:37

Adding Jared's predictions

7c3b32d

adding Jared's outputs

665c838

adding eval data to SB

f350c1f

adding export df of PBMTL models

2deec05

fixes DOI-USGS#5

1677dbd

Updates for exporting predictions and hooking up SB release items

6b39043

update for bad PB0

e6a9a7e

Resample obs to match model uniform depths

08cb82d

jordansread requested a review from aappling-usgs June 25, 2020 16:44

aappling-usgs reviewed Jun 25, 2020

View reviewed changes

jordansread commented Jun 25, 2020

View reviewed changes

jordansread merged commit f33438f into DOI-USGS:master Jun 26, 2020

jordansread mentioned this pull request Jun 26, 2020

forgot to push before merge #9

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update PBMTL,PGTML,PB0(bads),resample obs #8

update PBMTL,PGTML,PB0(bads),resample obs #8

jordansread commented Jun 25, 2020

jordansread commented Jun 25, 2020

aappling-usgs left a comment

aappling-usgs Jun 25, 2020

aappling-usgs Jun 25, 2020

aappling-usgs Jun 25, 2020

jordansread Jun 25, 2020

jordansread Jun 25, 2020

aappling-usgs Jun 25, 2020

jordansread Jun 25, 2020

aappling-usgs Jun 25, 2020

update PBMTL,PGTML,PB0(bads),resample obs #8

update PBMTL,PGTML,PB0(bads),resample obs #8

Conversation

jordansread commented Jun 25, 2020

jordansread commented Jun 25, 2020

aappling-usgs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment