Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K shortest routes #224

Open
jamiedtor opened this issue Apr 18, 2024 · 10 comments
Open

K shortest routes #224

jamiedtor opened this issue Apr 18, 2024 · 10 comments

Comments

@jamiedtor
Copy link

Is there a way to programmatically capture not just the least cost or shortest route but also alternatives, such as the second or third shortest routes?

@mpadge
Copy link
Member

mpadge commented Apr 18, 2024

Not at present, no. Current code could be adapted to implement it, but there's been no demand until now. If it was an important or interesting use case, I might be able to find time to give it a go?

@jamiedtor
Copy link
Author

Thanks, mpadge! Since I forgot to mention it before, really appreciate this package. It's been incredibly useful. In terms of k-shortest routes or alternatives, we're working on identifying likely primary pedestrian routes in newly densified in-fill areas to help prioritize active travel infrastructure. We've been relying on the shortest routes between origins and destinations but given the variability in pedestrian route selection, we think it might be more realistic to include weighted alternatives. We've seen it done with Andres Sevstuk's fantastic Urban Netork Analysis toolkit for Rhino, python and Arc. (Essentially some percentage of the O-D demand for a given pair is allocated to altenative routes based on assumptions about willingness to detour, i.e., the shortest route might get 70 percent of the demand while two alternative routes get 15 percent each.). But we'd really like to keep our entire workflow in R. It's definitely beyond my ability to write the code, but if it does seem like an interesting enough use case that you take it on, I'd been happy to contribute to writing up a vignette or tutorial for others.

@mpadge
Copy link
Member

mpadge commented Apr 22, 2024

Thanks @jamiedtor , that's definitely a use case that piques my interest. My motivation for all this work is advancing active travel in all ways possible. That sounds to me like what you're looking for is along the lines of the original motivation for this package, which started life as https://osm-router.github.io/osmprob/. The idea was to implement probabilistic routing, using algorithms presented here and here. That turned out to only be possible in very restricted scenarios, and was dependent on direct matrix inversions, which severely limited the scale of networks that could be analysed.

dodgr grew out of that, and eventually discarded notions of probabilistic routing so that the algorithms could be applied to arbitrarily large networks. But the notion of generating a probability density field throughout a network remains central to my ongoing motivations here, so definitely interested in pursuing this. I won't say anything more specific for the moment, but guess it would help to have an idea of your envisioned timetable? We can easily move to private communication if you prefer - feel free to email me anytime to chat further.

One further central question would be the rough scale of your envision queries, primarily via an estimate of how many typical destination points you'd be wanting for each point of origin? I mostly use everywhere-to-everywhere implementations, which also do not scale at all for typical k-paths queries. But even if you just wanted somewhere-to-everywhere, where somewhere was relatively restricted, that would make things feasible.

@jamiedtor
Copy link
Author

Thanks, Mark! Great to hear it might be interesting. In terms of the scale we're looking at, it's mostly areas of say 3 to 4 square kilometres. In terms of origins and destinations, probably the max we'd ever be considering would be a matrix of about 500 potential destinations for each origin. More often than not, closer to 200 potential destinations per origin. And for something like identifying priority routes to higher-order transit stations, we're actually looking at a much smaller basket of destinations ... something like three or four possible destinations from origins represented by building centroids within a radius of about 1000 m. (200 to 300).

@mpadge
Copy link
Member

mpadge commented Apr 26, 2024

@jamiedtor I've figured out how I could implement an efficient probabilistic allocation algorithm. That would produce densities along each edges in a network accounting for the probabilities of every potential path between each A->B pair. But it seems that's not quite what you want. If you want explicit alternative routes, then even standard k-shortest paths won't give you that, because all algorithms for deriving those are only based on the smallest possible path modifications, so the variation between paths will be definitively minimal.

An analogous problem for which i have previously adapted dodgr is a kind of "preferential routing." An example is routing to maximise length of travel through or along natural spaces. That generates very realistic alternative routes that reflect personal preferences for that kind of travel. I would also assert that it reflects the kind of consideration necessary to generate realistic scenarios of alternative routes, like it seems that you want here. For any particular problem or area of application, I think the best approach is to first identify scenarios reflecting a diversity of likely realistic routing choice models. Adapting dodgr for sets of those is then relatively straightforward.

In your case, routing preferences might be something like comparing default shortest routes with routes which maximise under constraint the proportion of the journey travelling alongside or near things like:

  • local activity centres (shops, entertainment, whatever)
  • points of interest
  • natural spaces

Routes can also be extracted which minimise exposure to undesirable aspects, such as traffic noise or pollution (generally very similar but not identical). What do you think?

@FlxPo
Copy link

FlxPo commented May 24, 2024

Could we just add some (small) noise to edges travel times and call dodgr_paths multiple times, and then compute the distribution of path selection ? This would be like sampling from a discrete choice model, where we can aggregate some known utility value based on travel time (+ other factors possibly) and the utility value of unobserved factors (sampled from a known distribution).

I'm not sure about the theoretical soundness and value of this approach. But I think it would correctly reflect the fact that two almost equivalent paths should be selected 50% of the time by travelers, for example.

@mpadge
Copy link
Member

mpadge commented May 27, 2024

@FlxPo That's an interesting approach that could indeed be worth pursuing. I've just got the latest dodgr version on CRAN, and will prioritize addressing this issue for the next version. Watch this space ...

@mpadge
Copy link
Member

mpadge commented Jun 5, 2024

@FlxPo Here's an in-principle demo of your idea. (Note that this uses m4ra::m4ra_hampi which is in SC-format, rather than the sf version here. Using sf can generate paths shorter than original because of effects described in #214.)

library (dodgr)
graph0 <- graph <- weight_streetnet (m4ra::m4ra_hampi, wt_profile = "foot")
#> Loading required namespace: geodist
#> Loading required namespace: dplyr

set.seed (2)
from <- sample (graph$.vx0, 1L)
to <- sample (graph$.vx1, 1L)
d0 <- dodgr_dists (graph, from, to) [1, 1]
p0 <- dodgr_paths (graph, from, to) [[1]] [[1]]
edge_wts <- graph0$d_weighted / graph0$d

npaths <- 20L
v <- dodgr_vertices (graph)

paths <- lapply (seq_len (npaths), function (n) {
    p1 <- p0
    while (identical (p0, p1)) {
        graph$d <- graph0$d + rnorm (nrow (graph), mean = 0, sd = sd (graph0$d) / 2)
        graph$d_weighted <- graph$d * edge_wts
        p1 <- dodgr_paths (graph, from, to) [[1]] [[1]]
    }

    # Calculate distance of p1:
    p1 <- v [match (p1, v$id), ]
    g1 <- data.frame (.vx0 = p1$id [-nrow (p1)], .vx1 = p1$id [-1]) |>
        dplyr::left_join (graph0, by = dplyr::join_by (.vx0, .vx1))
    p1$d <- c (0, g1$d)

    return (p1)
})

p0 <- v [match (p0, v$id), ]
p0$d <- NA
cols <- heat.colors (npaths + 1)
pall <- rbind (p0, do.call (rbind, paths))
plot (p0 [, c ("x", "y")], type = "l", lwd = 2, xlim = range (pall$x), ylim = range (pall$y))
for (i in seq_along (paths)) {
    lines (paths [[i]] [, c ("x", "y")], col = cols [i])
}
lines (p0 [, c ("x", "y")], type = "l", lwd = 2)

dists <- c (d0, vapply (paths, function (p) sum (p$d), numeric (1)))
hist (dists)
lines (c (d0, d0), c (0, 10), col = "red", lwd = 2)

Created on 2024-06-05 with reprex v2.1.0

The Hampi data sets offer very few alternative paths, so many collapse on to identical results, but that still demonstrates the viability of this approach. I think it's a very worthwhile one compared to more conventional approaches. Yen's algorithm sets individual edge weights to infinity to force searching for alternative paths, while other (generally better) variants derived from Eppstein are directly applicable to distance calculations alone, and require very extensive algorithmic modification for other use cases like extracting actual paths.

All derivations of Yen's reduce to identifying edges to be removed, and then recalculating distances, paths, whatever. This approach will never guarantee finding the true K-shortest paths, but I argued above for why any set of truly shortest paths is likely often not what people would or should desire for realistic (street) routing problems. The approach here will find a viable set of close alternative paths. I'll have a go at implementing something soon.

@FlxPo
Copy link

FlxPo commented Jun 5, 2024

I see two ways to improve the plausibility of the paths generated with this approach :

  • Have one part of the noise computed based on the attributes of the edges and a random preference coefficient. People might for example avoid all highways or all small roads, for example.
  • Have one part of the noise that is spatially correlated, such that connected edges have approximately the same time penalty or speed up. This could reflect congestion concentrated at a few nodes, instead of completely random variations.

I don't have time to test this at the moment so this more of a note for future work. Thanks for turning this idea into code !

@mpadge
Copy link
Member

mpadge commented Jun 5, 2024

Here is a more realistic example on a much larger street network (from Essen, Germany):

library (dodgr)
f <- "/<local>/<path>/<to>/essen-sc.Rds"
net <- readRDS (f)
graph0 <- graph <- weight_streetnet (net, wt_profile = "foot")

message ("graph has ", format (nrow (graph), big.mark = ","), " edges.")
#> graph has 730,678 edges.

set.seed (1)
from <- sample (graph$.vx0, 1L)
to <- sample (graph$.vx1, 1L)
d0 <- dodgr_dists (graph, from, to) [1, 1]
p0 <- dodgr_paths (graph, from, to) [[1]] [[1]]
edge_wts <- graph0$d_weighted / graph0$d

npaths <- 20L
# Parameter to determine how much edge distances are to be varied:
dist_change <- sd (graph$d) / 5
v <- dodgr_vertices (graph)

system.time ({
paths <- pbapply::pblapply (seq_len (npaths), function (n) {
    p1 <- p0
    while (identical (p0, p1)) {
        graph$d <- graph0$d + rnorm (nrow (graph), mean = 0, sd = dist_change)
        graph$d_weighted <- graph$d * edge_wts
        p1 <- dodgr_paths (graph, from, to) [[1]] [[1]]
    }

    # Calculate distance of p1:
    p1 <- v [match (p1, v$id), ]
    g1 <- data.frame (.vx0 = p1$id [-nrow (p1)], .vx1 = p1$id [-1]) |>
        dplyr::left_join (graph0, by = dplyr::join_by (.vx0, .vx1))
    p1$d <- c (0, g1$d)

    return (p1)
})
})
#>    user  system elapsed 
#>  38.527   0.017  38.606

p0 <- v [match (p0, v$id), ]
p0$d <- NA
cols <- heat.colors (npaths + 1)
pall <- rbind (p0, do.call (rbind, paths))
plot (p0 [, c ("x", "y")], type = "l", lwd = 2, xlim = range (pall$x), ylim = range (pall$y))
for (i in seq_along (paths)) {
    lines (paths [[i]] [, c ("x", "y")], col = cols [i])
}
lines (p0 [, c ("x", "y")], type = "l", lwd = 2)

Created on 2024-06-05 with reprex v2.1.0

That takes around 1.5 seconds for each additional path in a graph with 750,000 edges, which is okay, but not great. And those paths look really divergent on that plot, but here's some code to take those results and make an interactive plot:

net <- data.frame (.vx0 = p0$id [-nrow (p0)], .vx1 = p0$id [-1]) |>
    dplyr::left_join (graph0, by = dplyr::join_by (.vx0, .vx1))
nets <- lapply (paths, function (p) {
    data.frame (.vx0 = p$id [-nrow (p)], .vx1 = p$id [-1]) |>
        dplyr::left_join (graph0, by = dplyr::join_by (.vx0, .vx1))
})

library (mapdeck)
m <- mapdeck (style = mapdeck_style ("light")) |>
    add_line (net, 
              origin = c (".vx0_x", ".vx0_y"),
              destination = c (".vx1_x", ".vx1_y"),
              stroke_colour = "#111111",
              stroke_width = 2,
              layer_id = "zero"
    )
cols <- colourvalues::colour_values (seq_along (nets), palette = "viridis")
for (i in seq_along (nets)) {
    m <- add_line (m,
                  nets [[i]], 
                  origin = c (".vx0_x", ".vx0_y"),
                  destination = c (".vx1_x", ".vx1_y"),
                  stroke_colour = cols [i],
                  stroke_width = 2,
                  update_view = FALSE,
                  layer_id = as.character (runif (1))
        )
}
m

And the big spread out bit at the top actually looks like this:

image

Where the gap in the middle is because the network was weighted for pedestrian routing, and those two split collections of paths pass either side of a large motorway. So the routing is indeed doing what it should. And @FlxPo this will ultimately depend on a scaling parameter for the "noise", which we'd just need to decide how best to implement. But even in current form (which is additive), modified distances will still reflect general weighting preferences, as demonstrated by this result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants