Investigate reaches near reservoirs with multiple sites #95

limnoliver · 2021-04-21T19:27:01Z

I noticed that seg_id_nat 1638 had a high RMSE for the RGCN model (RMSE = 6.13) after I updated with the most recent data. This is right below the Neversink. Looks like there are multiple monitoring sites on this reach, and older sites from Ecosheds were further downstream than the new USGS site, which is capturing colder dynamics from the reservoir.

We may want to reconsider what data we're keeping, particularly at these reservoir sites where different places along the reach can have really different temperature signals. I don't think this is the reason this site is doing so bad (I think it's doing poorly because the model is clearly not picking up the fact that there is reservoir influence, and I think EcoSheds data was added after this model was trained, so the model didn't get a chance to see any of the NYCDEC data):

compare <- feather::read_feather('3_predictions/out/compare_predictions_obs.feather')
compare <- filter(compare, seg_id_nat %in% '1638') %>%
  filter(!is.na(rgcn2_full_temp_c)) %>%
  filter(date <= as.Date('2008-01-01') & date >= as.Date('2005-12-31'))
ggplot(compare, aes(x = date, y = rgcn2_full_temp_c)) +
  geom_line() +
  geom_point(data = compare, aes(x = date, y = mean_temp_c))

The text was updated successfully, but these errors were encountered:

aappling-usgs · 2021-04-22T14:14:57Z

I see your point that the red points still differ a lot from the model predictions, but the blue points still can't be helping the RMSE, right?

The predictions go down to near zero in winter, and the red points seem to be concentrated in the summer - is part of the impressive difference in your first plot due to the fact that the blue points are more year-round?

jzwart · 2021-04-22T14:30:01Z

hmm, that's tough. Maybe we could add a separate distance criteria for what observations sites to keep if the segment is directly below a reservoir, like only keep sites that are within 1000 m of the top of the segment. But then again, we're trying to predict the entire segment's mean temperature so we can't really throw away sites downstream either.

I wonder if this is a scenario where satellite temperatures could be useful since they have more spatial coverage - it might help represent the segment's mean temperature rather than training / testing on sites from either end of the segment. Or maybe we could somehow tell the model about where in the segment the data are coming from or add observation error?

aappling-usgs · 2021-04-22T14:52:47Z

I think PRMS seeks to predict temperatures at the downstream point of each reach, so our observed temperatures should actually prefer the downstream end when there are choices (or just accept the noise and average them all anyway).

We might want to keep a separate copy of nearest-to-reservoir observations for validation of reservoir model predictions.

limnoliver changed the title ~~Investigate reaches with multiple sites~~ Investigate reaches near reservoirs with multiple sites Apr 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate reaches near reservoirs with multiple sites #95

Investigate reaches near reservoirs with multiple sites #95

limnoliver commented Apr 21, 2021 •

edited

Loading

aappling-usgs commented Apr 22, 2021

jzwart commented Apr 22, 2021

aappling-usgs commented Apr 22, 2021

Investigate reaches near reservoirs with multiple sites #95

Investigate reaches near reservoirs with multiple sites #95

Comments

limnoliver commented Apr 21, 2021 • edited Loading

aappling-usgs commented Apr 22, 2021

jzwart commented Apr 22, 2021

aappling-usgs commented Apr 22, 2021

limnoliver commented Apr 21, 2021 •

edited

Loading