Skip to content
This repository has been archived by the owner on Mar 1, 2021. It is now read-only.

Investigate performance slow down #97

Closed
karussell opened this issue Jun 13, 2017 · 4 comments
Closed

Investigate performance slow down #97

karussell opened this issue Jun 13, 2017 · 4 comments
Milestone

Comments

@karussell
Copy link
Member

There seems to be a problem with speed since the u-turn avoidance.

@stefanholder
Copy link
Contributor

As discussed here u-turn avoidance should cause at most a 4x slowdown (at most double number of candidates means at most 4 times number of transitions). But since some candidates are tower nodes and query results are deduped (#91) the slowdown should be less than 4x. Actually, @kodonnell only measured a 2.5x slowdown after u-turn avoidance was implemented (see first link), which was even before #91.

@michaz
Copy link
Member

michaz commented Apr 24, 2018

You gave the answer in #80. When I run Measurement and set the initial collection size to 150_000, which will happen in large graphs, map matching takes forever, because it allocates and deallocates 4 * 600k a zillion times.

@karussell
Copy link
Member Author

karussell commented Apr 24, 2018

Oh, ugly. Thanks a lot @michaz - this got lost in issues :( !

Should we make initial collection size configurable in GH core or should we use a better heuristic like small initial size for small beeline distance? Also Dijstra and A* should have a different initialization, plus the different initialization size for bidirectional vs. unidirectional looks unsynched (Dijkstra 2K vs. 150K for bidir Dijkstra).

Probably the best would be to introduce a new variable in AlgorithmOptions and make it configurable, like "expected minimum nodes" or something.

@michaz
Copy link
Member

michaz commented Apr 24, 2018

I think in library code one should not add a special assumption, an overridden method, or even a configuration parameter, anywhere, for a 10% speed-up. We would probably overfit it to the particular cases we are looking at, plus we make the code bad by increasing its surface for errors, and by diverting attention from what it actually does.

Except maybe the authors of the JVM, because when they are successful, half the world gets 10% faster, so the investment in maintenance and testing and code-ugliness pays off.

Look at the formula for the initial collection size (I had to set a breakpoint to understand where the value actually comes from, because it goes back and forth so often between super and subclasses):

Math.min(Math.max(200, graph.getNodes() / 10), 150_000)

It is almost evil, because the typical graph size is "the world", or "a country", except in toy scenarios and test cases. So the result of the formula is almost always 150000, except in toy scenarios and test cases, so you never detect the problem created by 150000 with your toy scenarios and test cases, so it stays undetected. And the formula looks clever so nobody suspects anything, whereas if you had just written 150_000, it would have been obvious that this is a huge overhead and it would have been picked up by the performance test.

We'll probably meet in the middle, but you were asking: I would remove all of it.

Or, if necessary, introduce a "hint" (as you say), but the default should always be the default behavior of the collection classes.

Also, lets remove the subclassing of the collection classes (GHIntObjectHashMap). It makes additional assumptions where it's not clear where they come from. And the name of the parameter changes three times on its long way (size, capacity, expectedElements, which are literally three different things).

@michaz michaz closed this as completed in fa5d568 May 27, 2018
@karussell karussell added this to the 0.11 milestone Jun 14, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants