Crosswalk SD fisheries managed lakes against LAGOS-US lakes #221

hcorson-dosch-usgs · 2021-10-14T17:13:15Z

Crosswalk this shapefile of SD fisheries managed lakes against the LAGOS-US spatial data (see #220) to see how many of those SD fisheries managed lakes are missing from LAGOS because they are classified as intermittent

lindsayplatt · 2021-10-29T16:50:57Z

I did a quick analysis outside of the pipeline by manually downloading & unzipping the SD Managed Fisheries shapefile from Gretchen and the LAGOS-US LOCUS GDB. Then, I ran this code to evaluate:

library(sf)
library(tidyverse)

sd_fisheries <- st_read("South Dakota/ManagedFisheries.shp")
sd_fishers_transf <- st_transform(sd_fisheries, st_crs(lagos_us))
sd_fisheries_pts <- st_centroid(sd_fishers_transf)

lagos_us_gdb <- "gis_locus_v1.0_gdb/gis_locus_v1.0.gdb"
st_layers(lagos_us_gdb) # Available layers

lagos_us <- st_read(lagos_us_gdb) # Load lake shapes by default (since it is the first layer)
lagos_sd <- st_crop(lagos_us, st_bbox(sd_fishers_transf))
lagos_sd_pts <- st_centroid(lagos_sd)

# Now check overlap
fisheries_in_lagos <- st_join(sd_fisheries_pts, lagos_sd, join = st_within)

# 259 SD fisheries lakes within the LAGOS-US lakes
sd_fisheries_in_lagos <- fisheries_in_lagos %>% 
  filter(!is.na(lagoslakeid)) 
nrow(sd_fisheries_in_lagos)

# 495 lakes that didn't match a LAGOS-US lake
fisheries_in_lagos %>% 
  filter(is.na(lagoslakeid)) %>% 
  nrow()

# 6 of the LAGOS-US lakes had multiple SD fisheries lakes within them
fisheries_in_lagos %>% 
  filter(!is.na(lagoslakeid)) %>% 
  group_by(lagoslakeid) %>% 
  summarize(n_sd_fisheries = n()) %>% 
  filter(n_sd_fisheries > 1)

# Visualize overlap
plot(st_geometry(lagos_sd))
plot(st_geometry(sd_fisheries_pts), add=TRUE, col = "red", pch = 16)
plot(st_geometry(sd_fisheries_in_lagos), add=TRUE, col = "blue", pch = 16)

Out of the 754 SD Managed Fisheries, I found that 259 of them had centroids that fell within the LAGOS-US dataset. Six of the LAGOS-US lakes had more than one of the SD Managed Fisheries.

To do next: play with the actually joining for the point-in-polygon. I used st_within and didn't mess with any kind of buffer. There is probably more we can do there.

lindsayplatt · 2022-01-06T23:23:39Z

Redid this analysis using the code @jread-usgs pointed me to from crosswalk_poly_intersect_poly(). Copied the code from that function to use here since I am not pipelining anything yet.

First, load the data (see first code chunk for links to these data sets)

library(sf)
library(tidyverse)

lagos_us_gdb <- "gis_locus_v1.0_gdb/gis_locus_v1.0.gdb"
lagos_us <- st_read(lagos_us_gdb) # Load lake shapes by default (since it is the first layer)

sd_fisheries <- st_read("South Dakota/ManagedFisheries.shp")
sd_fisheries_transf <- st_transform(sd_fisheries, st_crs(lagos_us))

lagos_sd <- st_crop(lagos_us, st_bbox(sd_fisheries_transf))

Then, perform the polygon intersection:

# Based on code in `crosswalk_poly_intersect_poly()`

poly1_ID_name <- "SD_ID"
poly1_data <- sd_fisheries_transf %>% 
  # Treat each SD fisheries lake unique based on the Name + StateID columns
  # There are 40 non-unique lakes if you just go by Name
  unite(site_id, Name, StateID, na.rm = TRUE) %>% 
  mutate(site_id = sprintf("SD_%s", site_id)) %>% 
  select(site_id)
poly2_data <- lagos_sd %>% 
  mutate(site_id = lagoslakeid) %>% 
  select(site_id) %>% rename(geometry = Shape)

poly2_data <- st_zm(poly2_data)

# Aggregate polys by site ID (this step likely unnecessary for NHD but keeping for generality)
poly1_agg <- poly1_data %>%
  group_by(site_id) %>%
  summarise()

poly2_agg <- poly2_data %>%
  group_by(site_id) %>%
  summarise()

# Measure areas
poly1_agg$poly1_Area <- as.numeric(st_area(poly1_agg))
poly2_agg$site_id_Area <- as.numeric(st_area(poly2_agg))

poly1_agg <- sf::st_buffer(poly1_agg, dist = 0)
poly2_agg <- sf::st_buffer(poly2_agg, dist = 0)

colnames(poly1_agg)[colnames(poly1_agg)=="site_id"] <- poly1_ID_name

intersect.sf <-  sf::st_intersection(poly1_agg, poly2_agg)
intersect.sf$Intersect_Area <- as.numeric(st_area(intersect.sf))

### Drop intersections that are <1% of poly1 area or poly2 area
intersect.sf <- intersect.sf %>%
  filter(Intersect_Area > .01*poly1_Area) %>%
  filter(Intersect_Area > .01*site_id_Area)

intersect_out <- intersect.sf %>%
  st_drop_geometry()

colnames(intersect_out)[colnames(intersect_out)=="poly1_Area"] <- paste0(poly1_ID_name, "_Area")

Finally, evaluate the intersection by visualizing and looking at counts

plot(st_geometry(lagos_sd))
plot(st_geometry(sd_fisheries_transf), add=TRUE, col = "red", pch = 16)
plot(st_geometry(intersect.sf), add=TRUE, col = "blue", pch = 16)

# How many matched lagos? How many didn't?
n_sd <- length(unique(poly1_data$site_id)) # 751 SD Lakes
n_sd_in_lagos <- length(which(unique(poly1_data$site_id) %in% unique(intersect_out$SD_ID))) # 331 unique SD fisheries map to at least one LAGOS lake
n_sd_not_in_lagos <- length(which(!unique(poly1_data$site_id) %in% unique(intersect_out$SD_ID))) # 420 SD fisheries to not map to a LAGOS lake

round(n_sd_in_lagos/n_sd*100) # 44%
round(n_sd_not_in_lagos/n_sd*100) #56%

# There are 49 SD lakes that overlap more than 1 LAGOS lake
intersect_out %>% 
  group_by(SD_ID) %>% 
  summarize(count_lagos = length(unique(site_id))) %>% 
  filter(count_lagos > 1)

# There are 14 LAGOS lakes that overlap more than 1 SD lake
intersect_out %>% 
  group_by(site_id) %>% 
  summarize(count_sd = length(unique(SD_ID))) %>% 
  filter(count_sd > 1)

There are 420 unique SD Fisheries lakes that do not overlap with any LAGOS-US lakes.

lindsayplatt · 2022-01-06T23:37:19Z

Note that poly1_data has 754 observations but poly1_agg has only 751. This means that there were 3 lakes from the SD Fisheries data that shared the same Name & StateID.

jordansread · 2022-01-07T14:24:30Z

@lindsayplatt nice work.

From your centroid analysis, 259 of the lakes fell within LAGOS polygons and with this updated poly-poly match, that number went up to 331. Still less than half 😱

I think our original question was related to how LAGOS was dropping certain NHD high-res lakes that we're not dropping. LAGOS is dropping them because of some field values in NHD that suggest the lakes are intermittent or aren't real lakes in other ways (see "LAGOS because they are classified as intermittent" from above). We noticed a lot of these issues of dropped lakes appear in the Dakotas, so it was important to know what wouldn't be in LAGOS but is still an important lake to the fisheries managers. Since LAGOS is a subset of NHDHR (you can link to it with their crosswalk since it contains PERM_ID), perhaps it is necessary to do the ND to NHDHR crosswalk first, summarize what lakes aren't there ("X% of SD fisheries lakes didn't resolve to our NHDHR crosswalk") and then see which of the SD lakes that are matched to NHDHR are lost when we connect to LAGOS ids (which could be a table join instead of a spatial analysis).

lindsayplatt · 2022-01-10T17:23:02Z

OK, I did the polygon crosswalking using SD fisheries & NHD polygons.

library(sf)
library(tidyverse)

nhd_lakes_sf <- readRDS('1_crosswalk_fetch/out/canonical_lakes_sf.rds')
nhd_lakes_sf <- st_zm(nhd_lakes_sf)

sd_fisheries <- st_read("South Dakota/ManagedFisheries.shp")
sd_fisheries_transf <- st_transform(sd_fisheries, st_crs(nhd_lakes_sf))

sd_nhd_lakes_sf <- nhd_lakes_sf %>% 
  st_make_valid() %>% 
  st_crop(st_bbox(sd_fisheries_transf))

poly1_ID_name <- "SD_ID"
poly1_data <- sd_fisheries_transf %>% 
  # Treat each SD fisheries lake unique based on the Name + StateID columns
  # There are 40 non-unique lakes if you just go by Name
  unite(site_id, Name, StateID, na.rm = TRUE) %>% 
  mutate(site_id = sprintf("SD_%s", site_id)) %>% 
  select(site_id) %>% 
  st_make_valid()
poly2_data <- sd_nhd_lakes_sf %>% 
  select(site_id) %>% 
  rename(geometry = Shape)

# THEN FOLLOWED THE SAME CODE STARTING WITH `poly1_agg <- `

Then, I evaluated the outputs.

# How many matched an nhd lake? How many didn't?
n_sd <- length(unique(poly1_data$site_id)) # 751 SD Lakes
n_sd_in_nhd <- length(which(unique(poly1_data$site_id) %in% unique(intersect_out$SD_ID))) # 309 unique SD fisheries map to at least one NHD lake
n_sd_not_in_nhd <- length(which(!unique(poly1_data$site_id) %in% unique(intersect_out$SD_ID))) # 442 SD fisheries to not map to a NHD lake

round(n_sd_in_nhd/n_sd*100) # 41%
round(n_sd_not_in_nhd/n_sd*100) #59%

# There are 71 SD lakes that overlap more than 1 NHD lake
intersect_out %>% 
  group_by(SD_ID) %>% 
  summarize(count_nhd = length(unique(site_id))) %>% 
  filter(count_nhd > 1)

# There are 43 NHD lakes that overlap more than 1 SD lake
intersect_out %>% 
  group_by(site_id) %>% 
  summarize(count_sd = length(unique(SD_ID))) %>% 
  filter(count_sd > 1)

So, there are 442 SD Fisheries that are not in the NHD at all. This seems suspicious as we would expect there to be more that match NHD than match LAGOS. Is canonical_lakes_sf the appropriate lakes file to use from our pipeline @jread-usgs ?

jordansread · 2022-01-12T11:34:18Z

I went through this separately (and differently, but adding a crosswalk to a branch on my local fork) and verified the numbers are the same.

I think I've either said or implied that we've got a similar size cut-off as LAGOS (1ha), but we actually don't. We're filtering out the smallest lakes below 4 ha (see "min_size"). This is almost certainly the reason we have fewer matched lakes.

lindsayplatt · 2022-01-12T16:02:52Z

Thanks for going the extra step to verify this. So there are potentially 442 SD fisheries that we would want to add into our pipeline from another source, right?

jordansread · 2022-01-12T16:16:28Z

We don't currently have a method to add lakes from another source that don't resolve to our nhd IDs, since we're using the NHDHR ID as the comprehensive data "key" for this project. For relatively low(ish) cost, we could choose to bring in select NHDHR lakes that are below the 4ha cutoff. Reducing our cutoff to 1ha (like LAGOS has) is also possible, but makes every crosswalk calculation way more expensive ⏲️

lindsayplatt · 2022-01-12T17:15:54Z

Ah right, OK. Is it interesting that so many of these SD Fisheries are < 4 ha? Trying to understand that size and 4 ha ~ 7.5 football fields, so not quite as small as I thought 😄

jordansread · 2022-01-12T17:25:17Z

I think it is safe to assume we're missing their small lakes, but without a direct comparison, we don't know for sure that it is size that is the reason they are matched.

lindsayplatt · 2022-01-12T19:57:18Z

Not sure if this is useful, but doing a quick summary of the SD fisheries dataset (which has an Acres column already, though this could differ from what NHD has):

sd_fisheries %>% 
    st_drop_geometry() %>% 
    mutate(hectares = Acres/2.47105) %>% 
    summarize(n_total = n(), 
              n_above1ha = sum(hectares >= 1), 
              n_below1ha = sum(hectares < 1),
              n_above4ha = sum(hectares >= 4), 
              n_below4ha = sum(hectares < 4)) %>% t()

Result:

n_total     754
n_above1ha  384
n_below1ha  370
n_above4ha  323
n_below4ha  431

431 lakes below 4 ha is pretty similar to our missing 442, but doesn't quite account for the difference.

jordansread · 2022-01-12T19:59:42Z

wow, the number below 1ha is 👀 . Those are pretty small to be managed for fish!

padilla410 · 2022-04-06T19:58:30Z

Hey y'all, jumping into this conversation to summarize what is going on above:

the SD shapefile has 754 lakes, but more than half (~60%) of those lakes are expected get dropped from the final datasets. This is because many are smaller than the 4 ha threshold we set.
the number of lakes dropped is relatively consistent based on which matching approach we use (e.g., "SD lakes --> LAGOS data set --> NHDHR" or "SD lakes --> NHDHR")

Based on all of this, I am planning to go forward with developing the crosswalk using the standard "polygon-to-polygon" matching method outlined in the doc (i.e., the "SD lakes --> LAGOS --> NHDHR"). I will expect to see around ~330 lakes in that final data set.

padilla410 · 2022-04-07T14:36:11Z

<<comment deleted from this thread and moved here>>

lindsayplatt · 2022-04-07T15:53:46Z

@padilla410 once again, you have really gone the extra mile and dug deeply into these complex data issues 🙏 My understanding of this issue remaining open was that I was waiting to hear from Holly Kundel if any of those <4 ha lakes in SD were actually important to keep. The outcome from her would be whether or not a workaround to keep some of them was needed. Seems like you may have uncovered a different issue related to crosswalking our SD data to get them into the pipeline.

jordansread mentioned this issue Oct 29, 2021

Hypso data for SD and ND #41

Open

padilla410 mentioned this issue Dec 29, 2021

Multiple cooperator datasets not present in all_coop_dat_linked.feather due to missing crosswalks #267

Closed

jordansread assigned lindsayplatt Jan 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crosswalk SD fisheries managed lakes against LAGOS-US lakes #221

Crosswalk SD fisheries managed lakes against LAGOS-US lakes #221

hcorson-dosch-usgs commented Oct 14, 2021

lindsayplatt commented Oct 29, 2021

lindsayplatt commented Jan 6, 2022 •

edited

Loading

lindsayplatt commented Jan 6, 2022

jordansread commented Jan 7, 2022

lindsayplatt commented Jan 10, 2022

jordansread commented Jan 12, 2022

lindsayplatt commented Jan 12, 2022

jordansread commented Jan 12, 2022

lindsayplatt commented Jan 12, 2022

jordansread commented Jan 12, 2022

lindsayplatt commented Jan 12, 2022

jordansread commented Jan 12, 2022

padilla410 commented Apr 6, 2022

padilla410 commented Apr 7, 2022 •

edited

Loading

lindsayplatt commented Apr 7, 2022

Crosswalk SD fisheries managed lakes against LAGOS-US lakes #221

Crosswalk SD fisheries managed lakes against LAGOS-US lakes #221

Comments

hcorson-dosch-usgs commented Oct 14, 2021

lindsayplatt commented Oct 29, 2021

lindsayplatt commented Jan 6, 2022 • edited Loading

First, load the data (see first code chunk for links to these data sets)

Then, perform the polygon intersection:

Finally, evaluate the intersection by visualizing and looking at counts

lindsayplatt commented Jan 6, 2022

jordansread commented Jan 7, 2022

lindsayplatt commented Jan 10, 2022

jordansread commented Jan 12, 2022

lindsayplatt commented Jan 12, 2022

jordansread commented Jan 12, 2022

lindsayplatt commented Jan 12, 2022

jordansread commented Jan 12, 2022

lindsayplatt commented Jan 12, 2022

jordansread commented Jan 12, 2022

padilla410 commented Apr 6, 2022

padilla410 commented Apr 7, 2022 • edited Loading

lindsayplatt commented Apr 7, 2022

lindsayplatt commented Jan 6, 2022 •

edited

Loading

padilla410 commented Apr 7, 2022 •

edited

Loading