Skip to content
This repository has been archived by the owner on Mar 1, 2021. It is now read-only.

route cache for performance? #81

Closed
kodonnell opened this issue Nov 23, 2016 · 16 comments
Closed

route cache for performance? #81

kodonnell opened this issue Nov 23, 2016 · 16 comments

Comments

@kodonnell
Copy link
Contributor

I'm wondering if we should cache routing requests in computing the transition probabilities.

The first reason is obvious - if it's e.g. a long GPX track covering the same location multiple times, then some of the candidate routes may be repeated, etc.

The second reason is maybe a little more involved ...

  • for candidate-to-candidate routing we only use candidate's closest node
  • however, we dedupe candidates on edge not node
  • I believe this is 'correct' in the sense that two candidates are different even if they have the same closest node. @karussell - am I right in thinking this? Anyway, assuming it is correct, it means that for nearby candidates that share the same closest node but different edge, we are repeating the exact same route calculation. Herein lies the second benefit of a route cache.

It should be pretty easy to implement, though we'd probably want #73 implemented first so we can test performance etc.

@stefanholder
Copy link
Contributor

Does LocationIndexMatch.findNClosest return the perpendicular of the passed GPS position on all real edges within gpxAccuracyInMetern? Moreover, if the perpendicular is at the start/end of a real edge does it then return a real node? (Anyway, it would be nice to document this method.)

however, we dedupe candidates on edge not node

All real node candidates should be deduped on node ID to avoid redundant computation. Not sure if we also need to dedupe virtual nodes (but we should if there are duplicates before adding direction). So we could just dedupe all candidates using QueryResult.getClosestNode() before adding direction.

Otherwise, I think it is very unlikely that the same transition occurs multiple times during a map matching so a route cache should not be needed.

@kodonnell
Copy link
Contributor Author

Does LocationIndexMatch.findNClosest return ...

No idea. I'm using my own implementation, as findNClosest doesn't work for me.

All real node candidates should be deduped on node ID to avoid redundant computation

I thought that too, but see my comment above - we do need both edges as candidates (even if they share the same node). That's why we'd need the route cache, so that when these duplicate (by node ID) edges are routed, we only route once, to avoid redundant computations.

Otherwise, I think it is very unlikely that the same transition occurs multiple times ...

It depends on the size and nature of the route. E.g. for a taxi based at a single stand (e.g. the airport), they'll travel the same roads around the stand many times - and depending on the nature of the roads and the frequency of their GPS measurements, the same transitions may occur often. Similar for e.g. buses. My use case certainly duplicates them, but it's unlikely to be relevant to most. Either way, the route cache should help performance in the general case - and greatly help it if there are many duplicate transitions. So, no only positives, no harm?**

** OK, maybe more memory. Though eventually I'd like to not cache the actual paths (likewise in the timesteps), just the distance/time, etc.

@stefanholder
Copy link
Contributor

stefanholder commented Nov 25, 2016

we do need both edges as candidates (even if they share the same node)

I don't think so, because we only use the closest node for the candidate (as you wrote before). This is because the routing between subsequent candidates only needs a start and an end node. The closest edge of the QueryResult is never used.

the same transitions may occur often

But only if we get the exact same virtual nodes by another LocationIndexMatch.findNClosest call. Not sure how likely this is. We would need a tolerance to return the same virtual nodes for similar GPS positions.

@kodonnell
Copy link
Contributor Author

The closest edge of the QueryResult is never used.

I think it is - when we do the lookup for the virtual node. E.g.

   v1    v2
------N------- 

Currently, we get two candidates, one with a virtual node somewhere on v1 and one with a virtual node on v2. If we dedupe on closest node (N) then we'll only get one virtual node (on v1 or v2 depending which is closer to the GPX point), and hence we'll eliminate one (valid) candidate node. It's worth noting that these virtual nodes are directly used in the computation of the transition probabilities. However, that also means that our route cache wouldn't help if e.g. we were using the nodes as the key ...

But only if we get the exact same virtual nodes ...

Fair point. Again, I'm thinking of my use case (cell data) where the GPS location are quantized as such, and hence are likely to be repeated. In the generic case, I think you're right.

@stefanholder
Copy link
Contributor

It's worth noting that these virtual nodes are directly used in the computation of the transition probabilities.

Can you please show me where this happens? When calling algo.calcPath, only the virtual nodes are used as input to compute the path. Different calls of this method with different candidates but the same virtual nodes should still give same results.

Or are you referring to PR #83? There we potentially need to dedupe the result of findGPXEntriesInGraph on virtual nodes before adding direction.

@kodonnell
Copy link
Contributor Author

I think we agree - queryGraph.lookup can replace the closest node with a virtual one, which may then be an input to algo.calcPath, and this will always calculate the same path for the same (virtual or not) nodes. However, my point is that you'll be missing a candidate edge. Back to my picture, say we have the following

  v1     v2                       u1     u2
------N-------                  ------M------- 

with N/M virtual nodes (GPX points) and v1/v2/u1/u2 virtual edges (well pairs of them). If we dedupe on closest node, we might end up with just these edges:

  v1                        u2
------N                  M------- 

when there are actually three other possible arrangements. Whether or not it's OK to ignore these three, I'm not sure ... but if feels like we shouldn't (especially if we're going to be doing directional stuff e.g. #83). I also think I tried it (it's a one-liner replacement) and the tests failed - not 100% sure, but I'll check tomorrow if you haven't confirmed.

@stefanholder
Copy link
Contributor

queryGraph.lookup can replace the closest node with a virtual one

My understanding is that a GPS point is not replaced by a virtual node but that the closest (virtual) point of a GPS point is the perpendicular of the GPS point on the real edge. So different GPS points might have the same closest virtual node and this virtual node is then used in both cases for routing. @karussell, can you please clarify?

@karussell
Copy link
Member

karussell commented Nov 28, 2016

I'm not sure if the QueryGraph already does the optimization that two GPS points create only one virtual node if it is the same, I would have to try this.

My understanding is that a GPS point is not replaced by a virtual node but that the closest (virtual) point of a GPS point is the perpendicular of the GPS point on the real edge.

A GPS point is nothing we feed the algorithm, so we cannot replace nodes with it. Instead a GPS point is looked up with the QueryGraph which highly likely creates a virtual node (and 4 virtual edges) and sets the 'closest node' property of QueryResult.

          X
          |
A <------ V <---- B
  ------>   ---->

So X is the measurement via lat,lon ('GPS point') and creates a snapped point on the edge A-B (lat,lon), this snapped point has an associated virtual node V and is inserted with 4 new virtual edges. A and B are real nodes.

The GPS point highly likely creates no virtual nodes and edges if the measurement is not 'perpendicular' relative to the edge like for 'endstanding' edges e.g.

         X
        /
A------B

@stefanholder
Copy link
Contributor

Thanks @karussell! Does LocationIndexMatch.findNClosest return the snapped position of the GPS position to each real edge within gpxAccuracyInMetern?

@karussell
Copy link
Member

Yes, this is true for mostly all cases. But it can also return results outside of gpxAccuracyInMetern e.g. if otherwise the set would be empty. So it returns any closest match but in most cases there will be matches within the limit

@stefanholder
Copy link
Contributor

queryGraph.lookup can replace the closest node with a virtual one, which may then be an input to algo.calcPath, and this will always calculate the same path for the same (virtual or not) nodes.

Correct. Sorry, I first misread your sentence.

However, my point is that you'll be missing a candidate edge.

If we do the following, I don't think we're missing candidate edges:

  1. Get all query results from findNClosest
  2. Call QueryGraph.lookup with all query results
  3. Obtain the list of closest nodes from all query results
  4. Dedupe this list on node id
  5. Fall each virtual node in the list, create a candidate for each direction.

@kodonnell
Copy link
Contributor Author

Maybe we should wait until we see what form #83 takes?

@stefanholder
Copy link
Contributor

Maybe we should wait until we see what form #83 takes?

Yes, this makes sense.

@stefanholder
Copy link
Contributor

@karussell: I have a PR ready for deduping all query results by node id after calling QueryGraph.lookup. Should I create this PR now or wait until this repository is merged into the graphhopper repository?

@karussell
Copy link
Member

Nice - looking forward to this!

I have a PR ready for deduping all query results by node id after calling QueryGraph.lookup

Will this reuse existing virtual nodes, or in which sense did you mean deduping?

Should I create this PR now or wait until this repository is merged into the graphhopper repository?

Please do not wait for my action. Currently too many things on the table.

@stefanholder
Copy link
Contributor

Will this reuse existing virtual nodes, or in which sense did you mean deduping?

This will only dedupe multiple occurrences of the same tower node.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants