Find stops that appear to be misplaced #14

mattwigway · 2014-10-30T17:08:57Z

We want to find stops that are a long way away from the nearest stop. A good way to do this would be to compute the G function of all the stops and then see if this stop lies far from the distribution.

This seems like it would be slow, but this takes an imperceptible amount of time using one core of my machine, computing the G function for 14,000 randomly distributed stops (which is about how many there are in Chicago, the largest GTFS feed I know of).

library(spatstat)

# generate some random points
x <- runif(14000)
y <- runif(14000)

the.pp <- ppp(x, y)

est <- Gest(the.pp)

plot(est)

mattwigway · 2014-10-30T18:07:19Z

Also, we don't have to do coordinate transforms; we can just multiply all of the longitudes by (radius of earth at equator)/(length of chord at agency latitude), and units will be invariant enough.

mattwigway · 2016-05-02T14:19:34Z

K-function might be better to find groups of misplaced stops.

laidig · 2016-05-02T16:05:57Z

How do you define misplaced?

likely to be out of order,
not near the shape,
or not near the others on the trip?

mattwigway · 2016-05-02T16:12:10Z

I was thinking more of stops that are not near the rest of the stops in the entire feed, e.g. reversed lat/lon or at null island (0 lat 0 lon, due to math errors).

laidig · 2016-05-03T01:35:53Z

Using the number of stops in of a cluster or iterative k-means are good formulations.

If machine learning approaches seem a bit confusing, I have a suggestion: If you know the overall precision you want to target, use geohashes as a heuristic. If you spot something that does not have the same prefix, calculate the adjacent hashes and compare with those.

https://en.wikipedia.org/wiki/Geohash

This library has a method for calculating adjacent hashes.
https://github.com/davidmoten/geo

If you're interested in a test set, I have a copy of the Amtrak feed from 2013. I'm not able to think of a GTFS for a bigger region.

mattwigway self-assigned this Oct 30, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find stops that appear to be misplaced #14

Find stops that appear to be misplaced #14

mattwigway commented Oct 30, 2014

mattwigway commented Oct 30, 2014

mattwigway commented May 2, 2016

laidig commented May 2, 2016

mattwigway commented May 2, 2016

laidig commented May 3, 2016

Find stops that appear to be misplaced #14

Find stops that appear to be misplaced #14

Comments

mattwigway commented Oct 30, 2014

mattwigway commented Oct 30, 2014

mattwigway commented May 2, 2016

laidig commented May 2, 2016

mattwigway commented May 2, 2016

laidig commented May 3, 2016