-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crosswalk SD fisheries managed lakes against LAGOS-US lakes #221
Comments
I did a quick analysis outside of the pipeline by manually downloading & unzipping the SD Managed Fisheries shapefile from Gretchen and the LAGOS-US LOCUS GDB. Then, I ran this code to evaluate: library(sf)
library(tidyverse)
sd_fisheries <- st_read("South Dakota/ManagedFisheries.shp")
sd_fishers_transf <- st_transform(sd_fisheries, st_crs(lagos_us))
sd_fisheries_pts <- st_centroid(sd_fishers_transf)
lagos_us_gdb <- "gis_locus_v1.0_gdb/gis_locus_v1.0.gdb"
st_layers(lagos_us_gdb) # Available layers
lagos_us <- st_read(lagos_us_gdb) # Load lake shapes by default (since it is the first layer)
lagos_sd <- st_crop(lagos_us, st_bbox(sd_fishers_transf))
lagos_sd_pts <- st_centroid(lagos_sd)
# Now check overlap
fisheries_in_lagos <- st_join(sd_fisheries_pts, lagos_sd, join = st_within)
# 259 SD fisheries lakes within the LAGOS-US lakes
sd_fisheries_in_lagos <- fisheries_in_lagos %>%
filter(!is.na(lagoslakeid))
nrow(sd_fisheries_in_lagos)
# 495 lakes that didn't match a LAGOS-US lake
fisheries_in_lagos %>%
filter(is.na(lagoslakeid)) %>%
nrow()
# 6 of the LAGOS-US lakes had multiple SD fisheries lakes within them
fisheries_in_lagos %>%
filter(!is.na(lagoslakeid)) %>%
group_by(lagoslakeid) %>%
summarize(n_sd_fisheries = n()) %>%
filter(n_sd_fisheries > 1)
# Visualize overlap
plot(st_geometry(lagos_sd))
plot(st_geometry(sd_fisheries_pts), add=TRUE, col = "red", pch = 16)
plot(st_geometry(sd_fisheries_in_lagos), add=TRUE, col = "blue", pch = 16) Out of the 754 SD Managed Fisheries, I found that 259 of them had centroids that fell within the LAGOS-US dataset. Six of the LAGOS-US lakes had more than one of the SD Managed Fisheries. To do next: play with the actually joining for the point-in-polygon. I used |
Redid this analysis using the code @jread-usgs pointed me to from First, load the data (see first code chunk for links to these data sets)library(sf)
library(tidyverse)
lagos_us_gdb <- "gis_locus_v1.0_gdb/gis_locus_v1.0.gdb"
lagos_us <- st_read(lagos_us_gdb) # Load lake shapes by default (since it is the first layer)
sd_fisheries <- st_read("South Dakota/ManagedFisheries.shp")
sd_fisheries_transf <- st_transform(sd_fisheries, st_crs(lagos_us))
lagos_sd <- st_crop(lagos_us, st_bbox(sd_fisheries_transf)) Then, perform the polygon intersection:# Based on code in `crosswalk_poly_intersect_poly()`
poly1_ID_name <- "SD_ID"
poly1_data <- sd_fisheries_transf %>%
# Treat each SD fisheries lake unique based on the Name + StateID columns
# There are 40 non-unique lakes if you just go by Name
unite(site_id, Name, StateID, na.rm = TRUE) %>%
mutate(site_id = sprintf("SD_%s", site_id)) %>%
select(site_id)
poly2_data <- lagos_sd %>%
mutate(site_id = lagoslakeid) %>%
select(site_id) %>% rename(geometry = Shape)
poly2_data <- st_zm(poly2_data)
# Aggregate polys by site ID (this step likely unnecessary for NHD but keeping for generality)
poly1_agg <- poly1_data %>%
group_by(site_id) %>%
summarise()
poly2_agg <- poly2_data %>%
group_by(site_id) %>%
summarise()
# Measure areas
poly1_agg$poly1_Area <- as.numeric(st_area(poly1_agg))
poly2_agg$site_id_Area <- as.numeric(st_area(poly2_agg))
poly1_agg <- sf::st_buffer(poly1_agg, dist = 0)
poly2_agg <- sf::st_buffer(poly2_agg, dist = 0)
colnames(poly1_agg)[colnames(poly1_agg)=="site_id"] <- poly1_ID_name
intersect.sf <- sf::st_intersection(poly1_agg, poly2_agg)
intersect.sf$Intersect_Area <- as.numeric(st_area(intersect.sf))
### Drop intersections that are <1% of poly1 area or poly2 area
intersect.sf <- intersect.sf %>%
filter(Intersect_Area > .01*poly1_Area) %>%
filter(Intersect_Area > .01*site_id_Area)
intersect_out <- intersect.sf %>%
st_drop_geometry()
colnames(intersect_out)[colnames(intersect_out)=="poly1_Area"] <- paste0(poly1_ID_name, "_Area")
Finally, evaluate the intersection by visualizing and looking at countsplot(st_geometry(lagos_sd))
plot(st_geometry(sd_fisheries_transf), add=TRUE, col = "red", pch = 16)
plot(st_geometry(intersect.sf), add=TRUE, col = "blue", pch = 16)
# How many matched lagos? How many didn't?
n_sd <- length(unique(poly1_data$site_id)) # 751 SD Lakes
n_sd_in_lagos <- length(which(unique(poly1_data$site_id) %in% unique(intersect_out$SD_ID))) # 331 unique SD fisheries map to at least one LAGOS lake
n_sd_not_in_lagos <- length(which(!unique(poly1_data$site_id) %in% unique(intersect_out$SD_ID))) # 420 SD fisheries to not map to a LAGOS lake
round(n_sd_in_lagos/n_sd*100) # 44%
round(n_sd_not_in_lagos/n_sd*100) #56%
# There are 49 SD lakes that overlap more than 1 LAGOS lake
intersect_out %>%
group_by(SD_ID) %>%
summarize(count_lagos = length(unique(site_id))) %>%
filter(count_lagos > 1)
# There are 14 LAGOS lakes that overlap more than 1 SD lake
intersect_out %>%
group_by(site_id) %>%
summarize(count_sd = length(unique(SD_ID))) %>%
filter(count_sd > 1) There are 420 unique SD Fisheries lakes that do not overlap with any LAGOS-US lakes. |
Note that |
@lindsayplatt nice work. From your centroid analysis, 259 of the lakes fell within LAGOS polygons and with this updated poly-poly match, that number went up to 331. Still less than half 😱 I think our original question was related to how LAGOS was dropping certain NHD high-res lakes that we're not dropping. LAGOS is dropping them because of some field values in NHD that suggest the lakes are intermittent or aren't real lakes in other ways (see "LAGOS because they are classified as intermittent" from above). We noticed a lot of these issues of dropped lakes appear in the Dakotas, so it was important to know what wouldn't be in LAGOS but is still an important lake to the fisheries managers. Since LAGOS is a subset of NHDHR (you can link to it with their crosswalk since it contains PERM_ID), perhaps it is necessary to do the ND to NHDHR crosswalk first, summarize what lakes aren't there ("X% of SD fisheries lakes didn't resolve to our NHDHR crosswalk") and then see which of the SD lakes that are matched to NHDHR are lost when we connect to LAGOS ids (which could be a table join instead of a spatial analysis). |
OK, I did the polygon crosswalking using SD fisheries & NHD polygons. library(sf)
library(tidyverse)
nhd_lakes_sf <- readRDS('1_crosswalk_fetch/out/canonical_lakes_sf.rds')
nhd_lakes_sf <- st_zm(nhd_lakes_sf)
sd_fisheries <- st_read("South Dakota/ManagedFisheries.shp")
sd_fisheries_transf <- st_transform(sd_fisheries, st_crs(nhd_lakes_sf))
sd_nhd_lakes_sf <- nhd_lakes_sf %>%
st_make_valid() %>%
st_crop(st_bbox(sd_fisheries_transf))
poly1_ID_name <- "SD_ID"
poly1_data <- sd_fisheries_transf %>%
# Treat each SD fisheries lake unique based on the Name + StateID columns
# There are 40 non-unique lakes if you just go by Name
unite(site_id, Name, StateID, na.rm = TRUE) %>%
mutate(site_id = sprintf("SD_%s", site_id)) %>%
select(site_id) %>%
st_make_valid()
poly2_data <- sd_nhd_lakes_sf %>%
select(site_id) %>%
rename(geometry = Shape)
# THEN FOLLOWED THE SAME CODE STARTING WITH `poly1_agg <- ` Then, I evaluated the outputs. # How many matched an nhd lake? How many didn't?
n_sd <- length(unique(poly1_data$site_id)) # 751 SD Lakes
n_sd_in_nhd <- length(which(unique(poly1_data$site_id) %in% unique(intersect_out$SD_ID))) # 309 unique SD fisheries map to at least one NHD lake
n_sd_not_in_nhd <- length(which(!unique(poly1_data$site_id) %in% unique(intersect_out$SD_ID))) # 442 SD fisheries to not map to a NHD lake
round(n_sd_in_nhd/n_sd*100) # 41%
round(n_sd_not_in_nhd/n_sd*100) #59%
# There are 71 SD lakes that overlap more than 1 NHD lake
intersect_out %>%
group_by(SD_ID) %>%
summarize(count_nhd = length(unique(site_id))) %>%
filter(count_nhd > 1)
# There are 43 NHD lakes that overlap more than 1 SD lake
intersect_out %>%
group_by(site_id) %>%
summarize(count_sd = length(unique(SD_ID))) %>%
filter(count_sd > 1) So, there are 442 SD Fisheries that are not in the NHD at all. This seems suspicious as we would expect there to be more that match NHD than match LAGOS. Is |
I went through this separately (and differently, but adding a crosswalk to a branch on my local fork) and verified the numbers are the same. I think I've either said or implied that we've got a similar size cut-off as LAGOS (1ha), but we actually don't. We're filtering out the smallest lakes below 4 ha (see "min_size"). This is almost certainly the reason we have fewer matched lakes. |
Thanks for going the extra step to verify this. So there are potentially 442 SD fisheries that we would want to add into our pipeline from another source, right? |
We don't currently have a method to add lakes from another source that don't resolve to our nhd IDs, since we're using the NHDHR ID as the comprehensive data "key" for this project. For relatively low(ish) cost, we could choose to bring in select NHDHR lakes that are below the 4ha cutoff. Reducing our cutoff to 1ha (like LAGOS has) is also possible, but makes every crosswalk calculation way more expensive ⏲️ |
Ah right, OK. Is it interesting that so many of these SD Fisheries are < 4 ha? Trying to understand that size and 4 ha ~ 7.5 football fields, so not quite as small as I thought 😄 |
I think it is safe to assume we're missing their small lakes, but without a direct comparison, we don't know for sure that it is size that is the reason they are matched. |
Not sure if this is useful, but doing a quick summary of the SD fisheries dataset (which has an sd_fisheries %>%
st_drop_geometry() %>%
mutate(hectares = Acres/2.47105) %>%
summarize(n_total = n(),
n_above1ha = sum(hectares >= 1),
n_below1ha = sum(hectares < 1),
n_above4ha = sum(hectares >= 4),
n_below4ha = sum(hectares < 4)) %>% t() Result: n_total 754
n_above1ha 384
n_below1ha 370
n_above4ha 323
n_below4ha 431 431 lakes below 4 ha is pretty similar to our missing 442, but doesn't quite account for the difference. |
wow, the number below 1ha is 👀 . Those are pretty small to be managed for fish! |
Hey y'all, jumping into this conversation to summarize what is going on above:
Based on all of this, I am planning to go forward with developing the crosswalk using the standard "polygon-to-polygon" matching method outlined in the doc (i.e., the "SD lakes --> LAGOS --> NHDHR"). I will expect to see around ~330 lakes in that final data set. |
<<comment deleted from this thread and moved here>> |
@padilla410 once again, you have really gone the extra mile and dug deeply into these complex data issues 🙏 My understanding of this issue remaining open was that I was waiting to hear from Holly Kundel if any of those <4 ha lakes in SD were actually important to keep. The outcome from her would be whether or not a workaround to keep some of them was needed. Seems like you may have uncovered a different issue related to crosswalking our SD data to get them into the pipeline. |
Crosswalk this shapefile of SD fisheries managed lakes against the LAGOS-US spatial data (see #220) to see how many of those SD fisheries managed lakes are missing from LAGOS because they are classified as intermittent
The text was updated successfully, but these errors were encountered: