Skip to content
This repository has been archived by the owner on Jun 30, 2023. It is now read-only.

Edit search radius used for site-to-reach matching #116

Merged
merged 2 commits into from
Mar 18, 2022

Conversation

lekoenig
Copy link
Collaborator

Addresses #112.

When matching SC sites to PRMS segments we're currently using a search radius of 0.1 degrees (~10 km), although this is probably too broad and results in headwater tributaries getting improperly snapped to the larger mainstem rivers represented in PRMS (see #112 for further details).

I've edited get_site_flowlines() so that the search_radius is given in meters rather than degrees, and that the offset returned by nhdplusTools::get_flowline_index() is also in meters. The following changes are also included:

  • Rename the variable attribute offset to bird_dist_to_subseg_m because the latter is more informative and matches the terminology used in other projects (delaware-model-prep).
  • Instead of returning a data frame containing all sites (where bird_dist_to_subseg_m is NA for unmatched locations), the function now returns a sparse data set that only contains sites that are matched to a segment as determined by the search_radius or are specified by the user as sites to retain.

If we use 500 m here (while adding an argument to retain the two NWIS sites on the Delaware River that are >500 m away from their respective segments), p2_sites_w_segs goes from 3,523 rows to 1,845 rows and we lose 9,048 NWIS observation-days ( (~5% of our total NWIS obs-days for SC).

Here's a preview of p2_sites_w_segs:

> dim(p2_sites_w_segs)
[1] 1845   13
> head(p2_sites_w_segs) %>% as.data.frame()
                     site_id                                  site_name count_days_nwis count_days_discrete count_days_total       lon
1 11NPSWRD_WQX-HOFU_BOYER_01        FRENCH CREEK AT PA ROUTE 345 BRIDGE              NA                   1                1 -75.76927
2 11NPSWRD_WQX-HOFU_BOYER_04   FRENCH CREEK AT COUNTY PARK ROAD (T-452)              NA                   1                1 -75.72541
3 11NPSWRD_WQX-HOFU_BOYER_06  FRENCH CREEK AT SHEEDER MILL ROAD (T-491)              NA                   1                1 -75.62782
4 11NPSWRD_WQX-HOFU_BOYER_07    BIRCH RUN AT HOLLOW AND FRENCH CK ROADS              NA                   1                1 -75.62062
5 11NPSWRD_WQX-HOFU_BOYER_08   FRENCH CREEK AT HARES HILL ROAD (SR1045)              NA                   1                1 -75.56773
6 11NPSWRD_WQX-HOFU_BOYER_10 FRENCH CREEK AT RR TRESTLE IN PHOENIXVILLE              NA                   1                1 -75.51104
       lat datum       org_id   data_src_combined subsegid bird_dist_to_subseg_m segidnat
1 40.20343 NAD83 11NPSWRD_WQX Harmonized_WQP_data    863_1              2.249200     2297
2 40.16987 NAD83 11NPSWRD_WQX Harmonized_WQP_data    863_1             13.987362     2297
3 40.15085 NAD83 11NPSWRD_WQX Harmonized_WQP_data    863_1              7.507442     2297
4 40.14754 NAD83 11NPSWRD_WQX Harmonized_WQP_data    863_1             37.533292     2297
5 40.14096 NAD83 11NPSWRD_WQX Harmonized_WQP_data    888_1             15.870168     2322
6 40.13601 NAD83 11NPSWRD_WQX Harmonized_WQP_data    888_1              1.110031     2322
> 

@lekoenig lekoenig requested a review from jds485 March 18, 2022 14:45
Copy link
Member

@jds485 jds485 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I haven't run this yet because I have a question for you (see comment).

flowline_indices <- nhdplusTools::get_flowline_index(flines = reaches_nhd_fields,
points = sites_sf,
max_matches = max_matches,
search_radius = search_radius) %>%
search_radius = search_radius*2,
Copy link
Member

@jds485 jds485 Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why *2 now? Is this to allow NWIS sites to be matched and later retained? If that's why, I think it would be better to add a separate search radius column for the NWIS sites to retain. This would allow for still retaining NWIS sites when the search_radius is reduced to something smaller than would match the NWIS sites

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good question. I made this change to initially use a wider search radius when we apply nhdplusTools::get_flowline_index(). That function uses a radius search implemented in RANN::nn2() and I noticed when looking at DO data that a couple sites were not being picked up with a 500 m search radius (but were picked up with a 1000 m radius) even though the distance between those sites was < 500 m, as confirmed by the offset returned and separately using {sf}. More details are here.

So this doesn't have anything to do with the two NWIS sites but rather, to ensure we're picking up all the sites we would expect initially, which we then filter to the requested search_radius a couple lines down. I'm open to other suggestions, too, so happy to hear your thoughts!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange that those 2 sites do not get picked up! I'm not familiar with the nn2 function, and the source is in C, so I'm less inclined to look into if it's a matter of using a different argument in get_flowline_index

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn't worried about those two sites in particular, but the behavior was puzzling to me. The distances returned seem to be correct, so for now I decided to just adjust the argument in get_flowline_index to make sure we're capturing all the sites we'd expect. I tried to explain this reasoning in the comments because I know the different argument to search_radius is a little odd.

Copy link
Member

@jds485 jds485 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can merge when you're ready

@lekoenig lekoenig merged commit d0691b7 into USGS-R:main Mar 18, 2022
@lekoenig lekoenig deleted the 112-match-segs branch March 18, 2022 20:36
@lekoenig lekoenig linked an issue Mar 18, 2022 that may be closed by this pull request
2 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reassess sites-to-segs search radius
3 participants