Skip to content
This repository has been archived by the owner on Mar 1, 2021. It is now read-only.

map matching 'quality' test suite #89

Open
kodonnell opened this issue Dec 13, 2016 · 5 comments
Open

map matching 'quality' test suite #89

kodonnell opened this issue Dec 13, 2016 · 5 comments

Comments

@kodonnell
Copy link
Contributor

As discussed here it'd be good to have a way of testing how 'good' our map matching is by comparing it with known results. (I.e. our map-matching on a GPX track gives a certain route, and how close does that align to the actual known route taken?)

Considerations/comments (to direct a PR):

  • where do we get data of known routes? @stefanholder pointed out one example - do we want others? Do we want users to be able to test their own data sets?
  • we should see how map matching performs not only with the given GPX track, but also as we continually down-sample it.
  • how do we compare with the known route? Compare total distances of each? Or the proportion of the distance which they share common edges on?
  • how will we add this functionality? I think it is very similar to the Measurement suite (Performance measurement suite #73) in how it'll be used (and we could, I guess, put it in there if we wanted). Or we could build it into unit tests somehow (e.g. require that the match is sufficiently high.)
  • do we want to store historical trends in accuracy (as opposed to e.g. just comparing with the last commit) - and if so, how?
@karussell
Copy link
Member

karussell commented Dec 13, 2016

where do we get data of known routes?

There are many taxi tracks from NYC http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml and there are many on OSM itself: http://wiki.openstreetmap.org/wiki/Planet.gpx or GPSies

@karussell
Copy link
Member

The performance test suite could be also used for this. E.g. counting the errors and differences of the random routes or using different randomness etc

@stefanholder
Copy link
Contributor

stefanholder commented Jan 10, 2017

how do we compare with the known route? Compare total distances of each? Or the proportion of the distance which they share common edges on?

We should check to which degree the matched route goes through the same sequence of edge/nodes as the expected route. To this end we could compute the Levenshtein distance as a similarity measure between the expected and the actual node/edge sequence.

@stefanholder
Copy link
Contributor

To this end we could compute the Levenshtein distance

Newson & Krumm suggest a different metric in Chapter 5, which is simpler but probably still sufficient:
map matching error = (d_minus + d_plus) / d0

  • d_minus: total length of road segments that are in the ground truth route but not in the matched route
  • d_plus: total length of road segments that are in the matched route but not in the ground truth route
  • d0: length of the ground truth route

With this definition it is possible to have an error greater than 1. But I don't think that this is a real problem.

Another useful variant would be to use the number of road segments for d_minus, d_plus and d0 instead of their length.

@kodonnell
Copy link
Contributor Author

Agreed that, assuming 'normal' behaviour, it is probably fine. If it's not 'normal' - e.g. non-sequential - then this could be a very poor metric. A Levenshtein approach may handle these cases better.

That said, we may need to write something similar to, but not, Levenshtein. For example, transpositions are allowed (with the same cost as adding/removing). This makes sense in e.g. spelling detection, but probably not in routes - edges have the additional constraint of connectivity (which words largely don't - I think). So we'd either need to not consider transitions, or treat them differently. Similar applies for changing edges. So, it's possibly that our Levenshtein could end up only really considering additions/subtractions (in the 'normal' case) which is equivalent to the @karussell's suggestion.

As an aside - if we do go with Levenshtein, we should weight be edge length, not just counts.

Another thought - what about the area between the curves? We'd probably have time as a dimension (so area includes differences in time), so that even being at the point isn't necessarily good (unless you're also there at the same time). (Actually, the latter is something we'd need to consider in the above approaches too.) In terms of calculating the area, we can probably come up with a suitable proxy similar to numerical integration - e.g. if both route and ground truth had 100 nodes, we'd just sum up the difference (including time) of each node pair.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants