map matching 'quality' test suite #89

kodonnell · 2016-12-13T20:37:31Z

As discussed here it'd be good to have a way of testing how 'good' our map matching is by comparing it with known results. (I.e. our map-matching on a GPX track gives a certain route, and how close does that align to the actual known route taken?)

Considerations/comments (to direct a PR):

where do we get data of known routes? @stefanholder pointed out one example - do we want others? Do we want users to be able to test their own data sets?
we should see how map matching performs not only with the given GPX track, but also as we continually down-sample it.
how do we compare with the known route? Compare total distances of each? Or the proportion of the distance which they share common edges on?
how will we add this functionality? I think it is very similar to the Measurement suite (Performance measurement suite #73) in how it'll be used (and we could, I guess, put it in there if we wanted). Or we could build it into unit tests somehow (e.g. require that the match is sufficiently high.)
do we want to store historical trends in accuracy (as opposed to e.g. just comparing with the last commit) - and if so, how?

karussell · 2016-12-13T21:04:51Z

where do we get data of known routes?

There are many taxi tracks from NYC http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml and there are many on OSM itself: http://wiki.openstreetmap.org/wiki/Planet.gpx or GPSies

karussell · 2016-12-31T09:36:18Z

The performance test suite could be also used for this. E.g. counting the errors and differences of the random routes or using different randomness etc

stefanholder · 2017-01-10T15:23:34Z

how do we compare with the known route? Compare total distances of each? Or the proportion of the distance which they share common edges on?

We should check to which degree the matched route goes through the same sequence of edge/nodes as the expected route. To this end we could compute the Levenshtein distance as a similarity measure between the expected and the actual node/edge sequence.

stefanholder · 2017-01-24T17:04:33Z

To this end we could compute the Levenshtein distance

Newson & Krumm suggest a different metric in Chapter 5, which is simpler but probably still sufficient:
map matching error = (d_minus + d_plus) / d0

d_minus: total length of road segments that are in the ground truth route but not in the matched route
d_plus: total length of road segments that are in the matched route but not in the ground truth route
d0: length of the ground truth route

With this definition it is possible to have an error greater than 1. But I don't think that this is a real problem.

Another useful variant would be to use the number of road segments for d_minus, d_plus and d0 instead of their length.

kodonnell · 2017-01-24T18:16:11Z

Agreed that, assuming 'normal' behaviour, it is probably fine. If it's not 'normal' - e.g. non-sequential - then this could be a very poor metric. A Levenshtein approach may handle these cases better.

That said, we may need to write something similar to, but not, Levenshtein. For example, transpositions are allowed (with the same cost as adding/removing). This makes sense in e.g. spelling detection, but probably not in routes - edges have the additional constraint of connectivity (which words largely don't - I think). So we'd either need to not consider transitions, or treat them differently. Similar applies for changing edges. So, it's possibly that our Levenshtein could end up only really considering additions/subtractions (in the 'normal' case) which is equivalent to the @karussell's suggestion.

As an aside - if we do go with Levenshtein, we should weight be edge length, not just counts.

Another thought - what about the area between the curves? We'd probably have time as a dimension (so area includes differences in time), so that even being at the point isn't necessarily good (unless you're also there at the same time). (Actually, the latter is something we'd need to consider in the above approaches too.) In terms of calculating the area, we can probably come up with a suitable proxy similar to numerical integration - e.g. if both route and ground truth had 100 nodes, we'd just sum up the difference (including time) of each node pair.

kodonnell mentioned this issue Dec 13, 2016

prevent inner-link U-turns #83

Closed

kodonnell mentioned this issue Dec 13, 2016

suboptimal (?) transition probability metric #86

Open

karussell mentioned this issue Dec 31, 2016

Penalize inner-link U-turns (builds on #83) #88

Merged

kodonnell mentioned this issue Jan 15, 2017

give user control of filtering before map-matching #92

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

map matching 'quality' test suite #89

map matching 'quality' test suite #89

kodonnell commented Dec 13, 2016

karussell commented Dec 13, 2016 •

edited

Loading

karussell commented Dec 31, 2016

stefanholder commented Jan 10, 2017 •

edited

Loading

stefanholder commented Jan 24, 2017

kodonnell commented Jan 24, 2017

map matching 'quality' test suite #89

map matching 'quality' test suite #89

Comments

kodonnell commented Dec 13, 2016

karussell commented Dec 13, 2016 • edited Loading

karussell commented Dec 31, 2016

stefanholder commented Jan 10, 2017 • edited Loading

stefanholder commented Jan 24, 2017

kodonnell commented Jan 24, 2017

karussell commented Dec 13, 2016 •

edited

Loading

stefanholder commented Jan 10, 2017 •

edited

Loading