-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Dissolve Boundaries #2422
Comments
It took me a few minutes to understand what exactly this function would do in comparison to library(sf)
nc = read_sf(system.file("shape/nc.shp", package="sf"))
set.seed(241)
test = nc[sample(nrow(nc), 18),] |> st_geometry()
a = test |> st_union() |> st_as_sf()
a$id = 1:nrow(a)
b = a |> st_cast("POLYGON") |> st_as_sf()
#> Warning in st_cast.sf(a, "POLYGON"): repeating attributes for all
#> sub-geometries for which they may not be constant
b$id = 1:nrow(b)
c = test |> dissolve_boundaries() |> st_as_sf()
#> Loading required namespace: spdep
c$id = 1:nrow(c)
b
#> Simple feature collection with 11 features and 1 field
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: -83.73952 ymin: 34.30505 xmax: -75.77316 ymax: 36.55716
#> Geodetic CRS: NAD27
#> First 10 features:
#> id x
#> 1 1 POLYGON ((-83.1615 35.05922...
#> 1.1 2 POLYGON ((-76.00897 36.3196...
#> 1.2 3 POLYGON ((-75.97629 36.5179...
#> 1.3 4 POLYGON ((-75.78317 36.2251...
#> 1.4 5 POLYGON ((-82.74389 35.4180...
#> 1.5 6 POLYGON ((-78.95108 36.2338...
#> 1.6 7 POLYGON ((-76.70538 35.4119...
#> 1.7 8 POLYGON ((-76.6949 35.35043...
#> 1.8 9 POLYGON ((-81.659 36.11759,...
#> 1.9 10 POLYGON ((-80.45065 35.7648...
nrow(b)
#> [1] 11
c
#> Simple feature collection with 8 features and 1 field
#> Geometry type: GEOMETRY
#> Dimension: XY
#> Bounding box: xmin: -83.73952 ymin: 34.30505 xmax: -75.77316 ymax: 36.55716
#> Geodetic CRS: NAD27
#> x id
#> 1 MULTIPOLYGON (((-75.97629 3... 1
#> 2 POLYGON ((-77.98668 34.3399... 2
#> 3 POLYGON ((-78.95108 36.2338... 3
#> 4 MULTIPOLYGON (((-76.70538 3... 4
#> 5 POLYGON ((-82.24016 35.4681... 5
#> 6 POLYGON ((-81.659 36.11759,... 6
#> 7 POLYGON ((-80.50825 36.0708... 7
#> 8 POLYGON ((-83.1615 35.05922... 8
nrow(c)
#> [1] 8
cols = paletteer::paletteer_d("tvthemes::gravityFalls")
par(mfrow = c(2,2), mar = c(0,0,1,0))
plot(test, main = "Original geoms")
plot(a, main = "st_union()",
key.pos = NULL, reset = FALSE, pal = cols)
plot(b, main = "st_union |> st_cast('POLYGON')",
key.pos = NULL, reset = FALSE, pal = cols)
plot(c, main = "dissolve_boundaries()",
key.pos = NULL, reset = FALSE, pal = cols) It would be cool to have this in sf! Would this fit better as a possible |
Thanks @loreabad6 that clarifies! My guess is that the only difference between the bottom two plots is a single queen neighbour, and I'm not sure that is what you want with "dissolve". |
Forgive me if this is not the right thread to discuss this, but I have always wondered why {sf} didn't have an existing
Here it is easy since I have single aggregation for all fields (i.e. |
Could I ask why the |
I do not want to hijack the initial issue, but the aggregate function is much less intuitive as is, which is not consistent with other # Select only the numeric columns
numeric_cols <- sapply(wfrc_taz, is.numeric)
# Perform the aggregation only on numeric columns
wfrc_taz_agg <- aggregate(
wfrc_taz[, numeric_cols], # Subset to numeric columns
by = list(CO_NAME = wfrc_taz$CO_NAME), # Aggregate by CO_NAME
FUN = sum
) However, the function provided by @rCarto seems to be the exact one I was looking for. Thank you so much. The only wish/question I have is if it would work with logical filters such as in the code below. Also, maybe the r <- st_aggregate(nc, "dummy", c(is.character(), is.integer(), is.numeric()), c(first, sum, mean)) In that case, we would expand the |
A pipe-friendly st_dissolve function would be really helpful! I took the sample provided by @JosiahParry and turned it into a more involved pair of helper functions that handles grouped sf data input (updated this reprex to better handle the .by grouping argument). Safely use spdep::poly2nb and spdep::n.comp.nb to get an index of neighboring poly_2_nb_id <- function(x,
snap = NULL,
queen = TRUE,
quiet = TRUE,
...) {
rlang::check_installed("spdep")
fn <- invisible
if (quiet) {
fn <- suppressWarnings
}
rlang::try_fetch(
fn({
nb <- spdep::poly2nb(x, snap = snap, queen = queen, ...)
comp_nb <- spdep::n.comp.nb(nb)
comp_nb[["comp.id"]]
}),
error = \(cnd) {
rep_len(0, length(x))
}
)
} Dissolve geometry preserving existing or supplied grouping variables st_dissolve_by <- function(x,
...,
.by = NULL,
.keep = "nest",
do_union = TRUE,
.data_key = "data",
.dissolve_key = "group.comp.id") {
stopifnot(
!rlang::has_name(x, .dissolve_key),
is.data.frame(x)
)
# Handle tidyselect style .by arguments
by <- rlang::enquo(.by)
if (!dplyr::is_grouped_df(x) && !rlang::quo_is_null(by)) {
x <- dplyr::group_by(x, dplyr::across(!!by))
.by <- NULL
}
x_group_vars <- NULL
if (dplyr::is_grouped_df(x)) {
x_group_vars <- dplyr::group_vars(x)
.by <- x_group_vars
x <- dplyr::ungroup(x)
}
sf_column_nm <- attr(x, "sf_column")
# Create dissolve key with poly_2_nb_id
x <- x |>
dplyr::mutate(
"{.dissolve_key}" := paste0(
dplyr::cur_group_id(), ".",
poly_2_nb_id(.data[[sf_column_nm]], ...)
),
.by = .by
)
# Use st_combine or st_union (if `do_union = TRUE`)
sf_summarise_fn <- sf::st_combine
if (do_union) {
sf_summarise_fn <- sf::st_union
}
x_dissolve <- x |>
dplyr::summarise(
# Keep unique values for grouping variables (if supplied)
dplyr::across(
tidyselect::any_of(x_group_vars),
unique
),
# Combine geometry with sf summary function
dplyr::across(
tidyselect::all_of(sf_column_nm),
sf_summarise_fn
),
.by = tidyselect::all_of(.dissolve_key)
)
if (.keep != "nest") {
return(x_dissolve)
}
x_init <- x |>
tidyr::nest(
.by = dplyr::all_of(.dissolve_key),
.key = .data_key
)
x_dissolve |>
dplyr::left_join(
x_init,
by = .dissolve_key
) |>
dplyr::relocate(
dplyr::all_of(sf_column_nm),
.after = tidyselect::everything()
)
}
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
nc <- read_sf(system.file("shape/nc.shp", package = "sf"))
nc_dissolve <- st_dissolve_by(nc)
#> Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
#> ℹ Please use `all_of()` or `any_of()` instead.
#> # Was:
#> data %>% select(.by)
#>
#> # Now:
#> data %>% select(all_of(.by))
#>
#> See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
plot(nc_dissolve[, "group.comp.id"]) nc[["group"]] <- sample(LETTERS[1:5], size = nrow(nc), replace = TRUE)
nc_group_dissolve <- st_dissolve_by(nc, .by = group)
plot(nc_group_dissolve[, "group.comp.id"]) Created on 2024-10-08 with reprex v2.1.1 |
There is no need to call |
Thanks @rsbivand! I've updated the code taking advantage of this fact. Should be a considerable speed up. |
As this is being discussed as a feature request for |
That's a good question and a good point. I am unsure if it would be as "simple." As I understand it, we would need to write custom geos C code to replicate the contiguity algorithm—create an RTree, find candidates, check them, etc. To identify contiguous neighbors. Then we'd have to implement a graph traversal algorithm to identify the subgraphs—which...feels like a lot! |
@JosiahParry No GEOS code in GEOS could be used from 2010 to now with In all cases, each candidate pair is checked only once, so above or below the principal diagonal but not both, saving half the time. Finally, the C looping is done in https://github.com/r-spatial/spdep/blob/dbd3074b8822ec669fea628061bce88fe77c5bf2/src/polypoly.c#L216-L351, which checks for 1 boundary point within snap for queen, 2 within snap for rook, stopping when found. I did test against moving to predicates in To do this in Then to get the graph components, we'd need to convert the There is no reason to make any packages needed depends or imports, suggests is sufficient, as anyone needing to do this in general terms should always use the If "dissolving" subgraphs is desired, that subset of users could readily use |
IIRC, |
A common GIS task is to dissolve boundaries based on shared boundaries of polygons. Doing this with R always bends my head a little bit.
I think having a utility function in {sf} to do this would be very handy.
Here is a minimal example of how that function might work. Using
spdep
is probably the best for this task because it already has functions for identifying contiguous polygons as well as the connected subgraphs.This is inspired by @CGMossa's PhD work
The text was updated successfully, but these errors were encountered: