Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation of bad RMSE lakes against PB0 #171

Open
jordansread opened this issue May 22, 2020 · 7 comments
Open

Evaluation of bad RMSE lakes against PB0 #171

jordansread opened this issue May 22, 2020 · 7 comments
Assignees

Comments

@jordansread
Copy link

Now that we have the 6_evaluation stage set up, I was looking at some of the worst performers

image

Some of these have observations that don't make sense, such as 1° temperatures in October
image

Same kind of integer pattern in obs for another one
image

with values that don't make sense.

I thought maybe these would be a coop source where a column was flipped or something, but for the top worst sites, they all have wqp as the only source

real_bad_sites <- pb0_matched_to_observations %>% group_by(site_id) %>% summarize(rmse = sqrt(mean((pred-obs)^2, na.rm=TRUE)), n = length(depth)) %>% arrange(desc(rmse)) %>% head(10) %>% pull(site_id)
obs_for_eval <- scmake('obs_for_eval')

obs_for_eval %>% filter(site_id %in% real_bad_sites) %>% pull(source) %>% unique()
[1] "wqp"

This pattern seems to continue to at least the 20th worst site

image

I wonder if these are all from the same provider?

@jordansread
Copy link
Author

For future reference in easy to copy/paste code form:

pb0_matched_to_observations %>% group_by(site_id) %>% summarize(rmse = sqrt(mean((pred-obs)^2, na.rm=TRUE)), n = length(depth)) %>% arrange(desc(rmse))
# A tibble: 2,377 x 3
   site_id          rmse     n
   <chr>           <dbl> <int>
 1 nhdhr_109986912  17.2    29
 2 nhdhr_109989488  15.3    14
 3 nhdhr_121207127  14.7    20
 4 nhdhr_121650602  13.2    69
 5 nhdhr_145608202  12.1    27
 6 nhdhr_109984628  11.8    21
 7 nhdhr_121650552  11.6    26
 8 nhdhr_121650633  11.4    83
 9 nhdhr_121207134  11.3    61
10 nhdhr_109987472  11.3    20
11 nhdhr_121627799 11.3     51
12 nhdhr_121628955 11.3     34
13 nhdhr_121650613 11.2     84
14 nhdhr_109990726 10.9     68
15 nhdhr_69545019  10.8     86
16 nhdhr_85083102  10.8     53
17 nhdhr_109989482 10.2     32
18 nhdhr_121650592 10.2     59
19 nhdhr_109986464  9.60    48
20 nhdhr_121625003  8.98   105
# … with 2,367 more rows

@jordansread
Copy link
Author

The first 14 of those lakes all have monitoring locations that are prefixed with IL_EPA-

and they are mostly in the NE corner of IL:
image

Additionally, many (all?) seem to have ResultAnalyticalMethod/MethodIdentifier as "LAB"...which makes me wonder if these are the temperatures in the lab for some other extraction method vs actual field measurements...

@jordansread
Copy link
Author

Out of all of these sites,

table(d$`ResultAnalyticalMethod/MethodIdentifier`)

FIELD   LAB 
 2714  1609 

median of FIELD is 21.205°, median of LAB is 3°...

@jordansread
Copy link
Author

I have tacked on the monitoring ID to the source field for wqp data in the daily obs temperature build, so instead of getting source = 'wqp' we get a lot of different wqp sources, such as wqp_LCOWIS_WQX-E16

Now, I can group by source instead of site_id and take an RMSE to see if there are particular sources that are really bad vs pb0 (this is from the pgmtl-data-release pipeline btw):

mutate(pb0_matched_to_observations, pred_diff = pred-obs) %>%
     group_by(source) %>% summarize(rmse = sqrt(mean((pred_diff)^2, na.rm=TRUE)), n = length(source)) %>% arrange(desc(rmse)) %>% print(n=100)
# A tibble: 2,924 x 3
    source                                                                   rmse     n
    <chr>                                                                   <dbl> <int>
  1 wqp_LCOWIS_WQX-E16                                                      19.7      7
  2 wqp_LCOWIS_WQX-E-16                                                     15.5    154
  3 wqp_IL_EPA-RML-1                                                        15.4      7
  4 wqp_USGS-475150098210000                                                15.1      2
  5 wqp_LCOWIS_WQX-E-9                                                      14.7     36
  6 wqp_SDDENR_WQX-WHITELAWL03                                              13.9     26
  7 wqp_WIDNR_WQX-10031157                                                  11.6     74
  8 wqp_MNPCA-21-0057-00-206                                                11.4     14
  9 wqp_SDWRAP-SWLAZZZ3703A                                                 10.9      6
 10 wqp_MNPCA-21-0103-00-202                                                10.7     24
 11 wqp_SDDENR_WQX-WALLZZZWL08                                              10.7      8
 12 wqp_LCOWIS_WQX-E17                                                      10.6     12
 13 wqp_MNPCA-21-0106-01-204                                                10.5     24
 14 wqp_MNPCA-21-0106-02-201                                                10.4      8
 15 wqp_IL_EPA_WQX-WGZJ-2                                                   10.3      2
 16 wqp_WIDNR_WQX-10029926                                                   9.92   174
 17 wqp_MNPCA-21-0085-00-207                                                 9.76    24
 18 7a_temp_coop_munge/tmp/South_Center_DO_2018_09_11_All.rds                9.61   853
 19 7a_temp_coop_munge/tmp/Carlos_DO_2018_11_05_All.rds                      9.57   996
 20 wqp_MNPCA-21-0054-00-205                                                 9.53    23
 21 7a_temp_coop_munge/tmp/Greenwood_DO_2018_09_14_All.rds                   9.51  1043
 22 wqp_MNPCA-77-0150-02-205                                                 9.34    52
 23 wqp_MNPCA-69-0939-02-203                                                 9.23    18
 24 wqp_MNPCA-82-0001-00-206                                                 8.98     2
 25 wqp_WIDNR_WQX-10033610                                                   8.92     4
 26 wqp_IL_EPA_WQX-RGE-2                                                     8.91     5
 27 wqp_NARS_WQX-NLA06608-0859                                               8.83    20
 28 wqp_LCOWIS_WQX-E-17                                                      8.78    82
 29 wqp_IL_EPA_WQX-RGE-1                                                     8.65    92
 30 wqp_IL_EPA_WQX-RTI-3                                                     8.48     3
 31 wqp_WIDNR_WQX-443514                                                     8.36    12
 32 wqp_MNPCA-27-0139-00-201                                                 8.33   253
 33 wqp_MNPCA-21-0052-00-205                                                 8.32    24
 34 wqp_USGS-454616092082100                                                 8.25     6
 35 wqp_IL_EPA_WQX-RGL-1                                                     8.17   140
 36 wqp_MNPCA-70-0091-00-452                                                 8.05     1
 37 wqp_MNPCA-11-0246-00-201                                                 8.03     1
 38 wqp_IL_EPA_WQX-RPC-2                                                     8.02     7
 39 wqp_MNPCA-19-0071-00-202                                                 7.98     5
 40 wqp_MNPCA-69-0790-00-201                                                 7.87    43
 41 wqp_MNPCA-27-0133-10-101                                                 7.85   120
 42 wqp_NALMS-6703                                                           7.83     4
 43 wqp_WIDNR_WQX-403112                                                     7.78    75
 44 wqp_MNPCA-69-0694-00-117                                                 7.73     1
 45 wqp_IL_EPA_WQX-RHD-2                                                     7.71     3
 46 wqp_MNPCA-18-0372-00-101                                                 7.69    95
 47 wqp_NALMS-3283                                                           7.63     1
 48 wqp_USGS-480352099093800                                                 7.61    11
 49 wqp_USGS-425235088075302                                                 7.60     1
 50 wqp_WIDNR_WQX-403107                                                     7.59   485
 51 wqp_MNPCA-29-0142-00-201                                                 7.58    10
 52 wqp_IL_EPA_WQX-RTW-1                                                     7.58   134
 53 wqp_MNPCA-21-0080-00-204                                                 7.56    24
 54 wqp_USGS-482018092292001                                                 7.48    36
 55 wqp_MNPCA-73-0139-00-204                                                 7.45    57
 56 wqp_WIDNR_WQX-193050                                                     7.40    17
 57 wqp_IL_EPA-WGX-1                                                         7.38     7
 58 wqp_IL_EPA_WQX-WGZJ-1                                                    7.37    63
 59 wqp_MNPCA-27-0062-03-202                                                 7.36     1
 60 wqp_MNPCA-18-0044-00-201                                                 7.31     1
 61 wqp_MNPCA-69-0859-02-201                                                 7.28     5
 62 wqp_USGS-423755088341700                                                 7.26    40
 63 wqp_USGS-435721084561801                                                 7.25     5
 64 wqp_MNPCA-15-0068-00-207                                                 7.24    38
 65 wqp_IL_EPA_WQX-RPA-1                                                     7.17    99
 66 wqp_LCOWIS_WQX-W-4                                                       7.17   216
 67 wqp_MNPCA-82-0033-00-201                                                 7.15    50
 68 wqp_MNPCA-62-0005-00-201                                                 7.14     2
 69 wqp_WIDNR_WQX-403110                                                     7.13  1712
 70 wqp_MNPCA-82-0031-00-201                                                 7.12     5
 71 wqp_MNPCA-21-0123-00-218                                                 7.10    24
 72 wqp_MNPCA-27-0014-00-201                                                 7.09  2286
 73 wqp_MNPCA-77-0215-00-209                                                 7.07   101
 74 7a_temp_coop_munge/tmp/Tenmile_1997_Temperatures.rds                     7.06    28
 75 wqp_SDDENR_WQX-KINGSBUC03                                                7.02    16
 76 wqp_MNPCA-71-0159-00-203                                                 7.00     5
 77 wqp_USGS-454856094544602                                                 6.99    37
 78 wqp_MNPCA-77-0215-00-202                                                 6.98    80
 79 wqp_IL_EPA_WQX-RGE-3                                                     6.98     8
 80 wqp_21NDHDWQ-385455                                                      6.98     5
 81 wqp_MNPCA-82-0110-00-451                                                 6.92    22
 82 wqp_MNPCA-16-0253-00-202                                                 6.86     1
 83 wqp_USGS-444016085310201                                                 6.82     6
 84 wqp_MNPCA-19-0024-00-451                                                 6.80    11
 85 wqp_MNPCA-27-0129-00-201                                                 6.77     1
 86 wqp_IL_EPA_WQX-RGB-2                                                     6.77     9
 87 wqp_MNPCA-18-0358-00-201                                                 6.75     4
 88 wqp_MNPCA-69-0939-01-204                                                 6.73    89
 89 wqp_LCOWIS_WQX-RND-3                                                     6.71   168
 90 wqp_WIDNR_WQX-513088                                                     6.67   307
 91 wqp_WIDNR_WQX-013144                                                     6.66   103
 92 wqp_MNPCA-61-0023-00-204                                                 6.66    10
 93 wqp_WIDNR_WQX-10007592                                                   6.63    13
 94 wqp_USGS-425235088075300                                                 6.62    28
 95 wqp_MNPCA-27-0133-02-205                                                 6.56     2
 96 wqp_IL_EPA_WQX-RTW-2                                                     6.55     2
 97 wqp_USGS-435009088550100                                                 6.54     9
 98 wqp_LCOWIS_WQX-W7                                                        6.51    14
 99 7a_temp_coop_munge/tmp/grant_mnlakedata_historicalfiles_manualentry.rds  6.50    64
100 wqp_IL_EPA_WQX-VTJ-1                                                     6.46   129
# … with 2,824 more rows

and taking the first one off the top since it has a small number of obs:

pb0_matched_to_observations %>% filter(source == 'wqp_LCOWIS_WQX-E16')
# A tibble: 7 x 6
  site_id        date       depth   obs  pred source            
  <chr>          <date>     <dbl> <dbl> <dbl> <chr>             
1 nhdhr_74926427 2013-07-15  7.62  5.78  24.1 wqp_LCOWIS_WQX-E16
2 nhdhr_74926427 2013-07-15 10.7   4.39  24.0 wqp_LCOWIS_WQX-E16
3 nhdhr_74926427 2013-07-15 13.7   3.83  23.9 wqp_LCOWIS_WQX-E16
4 nhdhr_74926427 2013-07-15 16.8   3.83  23.8 wqp_LCOWIS_WQX-E16
5 nhdhr_74926427 2013-07-15 19.8   3.83  23.8 wqp_LCOWIS_WQX-E16
6 nhdhr_74926427 2013-07-15 22.9   3.83  23.7 wqp_LCOWIS_WQX-E16
7 nhdhr_74926427 2013-07-15 24.4   3.83  23.7 wqp_LCOWIS_WQX-E16

This is Lake Chippewa in Sawyer, WI

read_csv('out_data/lake_metadata.csv') %>% filter(site_id == 'nhdhr_74926427')
# A tibble: 1 x 9
  site_id        lake_name     group_id                     meteo_filename                                    centroid_lon centroid_lat   SDF state county
  <chr>          <chr>         <chr>                        <chr>                                                    <dbl>        <dbl> <dbl> <chr> <chr> 
1 nhdhr_74926427 Lake Chippewa 06_N45.50-46.50_W84.50-92.00 nldas_meteo_N45.9375-45.9375_W91.1875-91.1875.csv        -91.2         45.9  16.2 WI    Sawyer

and it is a complex lake:
image

The second worst source is wqp_LCOWIS_WQX-E-16 which is probably the same monitoring ID. It is definitely in the same lake.

Modeled (red) and observed (black) are very different

image

The pb0 model thinks this is a well-mixed lake (at least up to 25 m deep) while the obs are a strongly stratified system that looks more like a small lake to me. Perhaps this is a bay.

Other sources seem clearly wrong, like 7a_temp_coop_munge/tmp/Greenwood_DO_2018_09_14_All.rds, which looks like the depths are flipped 👀

image

@limnoliver heads up on that one ☝️ but note we haven't done any kind of comprehensive look.

@jordansread
Copy link
Author

Looks like at least
7a_temp_coop_munge/tmp/South_Center_DO_2018_09_11_All.rds, 7a_temp_coop_munge/tmp/Carlos_DO_2018_11_05_All.rds, and 7a_temp_coop_munge/tmp/Greenwood_DO_2018_09_14_All.rds have depths flipped

@limnoliver
Copy link
Contributor

limnoliver commented May 26, 2020

Yikes! The explainer file for South_Center says:

Note that the depth of the sample is in negative.

And that was interpreted (by me) as simply needing to multiply by -1. And, turns out, I processed South Center, Carlos, and Greewood with the same parser, and did the same thing, since all had negative depth vals. So, more likely, this is distance from bottom, where 0 is bottom, and ~-28m is surface? In that case, I'm guessing we will lose these data because we can't be certain on depth? OR, we assume the first measure is taken at 0m?

@jordansread
Copy link
Author

Perhaps looping in w/ Holly related to these files and #173 would be good. Doesn't help us for this immediate issue, but probably good to get on the radar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants