precompute sin/cos to improve performance #341

VRichardJP · 2021-12-14T07:37:41Z

As discussed in #340

corot

looks interesting, but just checking the time expended on computeCmdVel method, I get worse results than with current melodic-devel:

With PR on same route:

Admittedly, is not a very sophisticated test, but I get consistent results.
Can you try to compare the time expended on computeCmdVel method?

include/teb_local_planner/pose_se2.h

VRichardJP · 2021-12-15T01:47:44Z

@corot I don't include cmd_vel_ in my benchmark so I overlooked it. I guess the issue here is that when PoseSE2 is copied sin/cos gets calculated over and over instead of being copied. For example with the current code, something like:

pose1.theta() = pose2.theta();

is equivalent to:

pose1.theta()._theta = pose2.theta()._theta;
pose1.theta().update_sincos();

Maybe adding a simple copy constructor would fix the performance in your benchmark:

Theta(const Theta& other) {
    _rad = other._rad;
    _sin = other._sin;
    _cos = other._cos;
  }

That being said, I don't really like to keep implicit Theta->double and double->Theta conversion as Theta comes with an upfront cost. I am considering 2 ways to tackle this:

Keep implicit conversion but make sin/cos evaluation lazy. This is really easy to implement and as little impact on the rest of the code but it comes with a small runtime cost. For example:

const double& sin() {
    if (!_init) {
        update_sincos();
        _init = true;
    }
    return _sin;
}

Make conversion explicit both ways and modify the rest of teb code accordingly. There is no runtime cost but it adds complexity in the code. Eventually, there are cases where sin/cos values are not necessary and could be left undefined, but that could be the source of bugs in the future.

What do you think?

VRichardJP · 2021-12-15T04:54:55Z

I have tested both approach 1 and 2 in my setup. The difference in performance is negligible on my machine: the implementation 1 is ~1% slower compared to the implementation 2, while being way simpler and less prone to bugs.

EDIT: I observe with google perftools that with the lazy sincos version, the total number of sincos call is reduced by 2/3 compared to original branch. This bears out my initial observation that sin/cos computation is largely duplicated.

corot

Much better now, but I still get times slightly above melodic-devel. Maybe sin / cos implementation does already some caching? 🤔 (I compile in release, so I assume the implementation of std is very efficient)

With PR on same route:

Did you measure the time on computeCmdVel?
Or how do you evaluate the performance gain?

VRichardJP · 2021-12-16T00:26:51Z

Hi, I'm a bit confused to see it does not improve things in your side =/

To benchmark the changes I am using a private fork of melodic-devel, but besides a few extra options or weights the functionality/performance should be the same. I am measuring performance using test_optim_node: I start the node, load a scenario (custom set of obstacle and start/stop pose), do N loops of CB_mainCycle then stop. I have more than 200 scenarios. Although the time I measure includes some ros setup and initialization, with my PR I get a clean +10% performance gain on the whole benchmark, which means the performance on the plan() function itself is even bigger.

For example, I have generated a call graph with google-perftools of one of the scenario without using my changes (RelWithDebInfo mode with SET(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--no-as-needed,-lprofiler,--as-needed"), ran with CPUPROFILE_FREQUENCY=1000 CPUPROFILE=/tmp/teb.profile roslaunch <...> and visualized with google-pprof --web install/teb_local_planner/lib/teb_local_planner/test_optim_node /tmp/teb.profile):

You can see at the bottom of the picture that test_optim_node spends 34% of its time in _sincos.

This is the same graph but with the PR:

The time spent in _sincos has dropped to 12% because eval_sincos catches many duplicates. For some reason you can see that TwoCirclesRobotFootprint::calculateDistance still manage to call _sincos directly and does not go through eval_sincos. I am wondering why it happens, maybe it is an after effect of the compiler optimization.

From your performance I understand that my PR has no effect on your machine. I see 2 possible causes:

_sincos calls are replaced by a table lookup after optimization. In that case, my PR is absolutely useless and just adds an useless if and extra data to copy. What CPU/compiler are you using? on my side, I have a Intel i7-9700 @ 3.00GHz and uses gcc 7.5.0 compiler
teb parameters and/or the scenario: my test world is quite big (generated paths contains around 300 points), my vehicle is a carlike (non-holonomic) using two_circles footprint. From my understanding of teb, the world/vehicle/tuning used does just influence the number of sin/cos call, which are duplicated in all cases. So it should not be a thing.

corot · 2021-12-22T10:12:01Z

Hi, I'm a bit confused to see it does not improve things in your side =/

me too! I tried to rule out some possibilities:

I'm using circular footprint model. I have tried a similar test with polygon footprint model but I got similar results: no improvement.
I tried compiling on DEBUG, but again, I got similar results: no improvement.

What's that test_optim_node? I can give it a try.
Could you also test just recording execution time for computeCmdVel calls? I use this simple code:

time.diff.zip

VRichardJP · 2021-12-23T00:32:28Z

circular footprint is faster than two_circles in the distance to obstacle calculation. With the latter it is necessary to compute some sin/cos to evaluate the position of each circle center point. That said there are still many other places that heavily use sin/cos and should theoretically benefits from this PR.
test_optim_node node: https://github.com/rst-tu-dortmund/teb_local_planner/blob/melodic-devel/src/test_optim_node.cpp
and its launch file: https://github.com/rst-tu-dortmund/teb_local_planner/blob/melodic-devel/launch/test_optim_node.launch

This is the callback I use for benchmarking purpose:

teb_local_planner/src/test_optim_node.cpp

Lines 165 to 170 in ec5759d

    
           // Planning loop 
        
           void CB_mainCycle(const ros::TimerEvent& e) 
        
           { 
        
             planner->plan(PoseSE2(-4,0,0), PoseSE2(4,0,0)); // hardcoded start and goal for testing purposes 
        
           }

RainerKuemmerle · 2021-12-28T17:56:57Z

Some edges compute their Jacobian numerically. This might also explain why you observe a lot of calls to sin/cos in general. For those edges, caching does not help when evaluating wrt theta but for x/y. For edges with analytic Jacobian, the overhead of caching might lead to negative impact. Maybe benchmark per Edge on computeError/linearizeOplus leads to interestings insights.

precompute sin/cos to improve performance

6b3e96d

corot requested changes Dec 14, 2021

View reviewed changes

include/teb_local_planner/pose_se2.h Outdated Show resolved Hide resolved

include/teb_local_planner/pose_se2.h Outdated Show resolved Hide resolved

VRichardJP and others added 2 commits December 15, 2021 14:17

lazily compute sincos

09d371b

use precalculated cos/sin value

8c4f8d1

VRichardJP requested a review from corot December 15, 2021 07:38

corot reviewed Dec 15, 2021

View reviewed changes

VRichardJP requested a review from corot December 19, 2021 08:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

precompute sin/cos to improve performance #341

precompute sin/cos to improve performance #341

VRichardJP commented Dec 14, 2021

corot left a comment

VRichardJP commented Dec 15, 2021 •

edited

Loading

VRichardJP commented Dec 15, 2021 •

edited

Loading

corot left a comment •

edited

Loading

VRichardJP commented Dec 16, 2021 •

edited

Loading

corot commented Dec 22, 2021

VRichardJP commented Dec 23, 2021

RainerKuemmerle commented Dec 28, 2021

precompute sin/cos to improve performance #341

Are you sure you want to change the base?

precompute sin/cos to improve performance #341

Conversation

VRichardJP commented Dec 14, 2021

corot left a comment

Choose a reason for hiding this comment

VRichardJP commented Dec 15, 2021 • edited Loading

VRichardJP commented Dec 15, 2021 • edited Loading

corot left a comment • edited Loading

Choose a reason for hiding this comment

VRichardJP commented Dec 16, 2021 • edited Loading

corot commented Dec 22, 2021

VRichardJP commented Dec 23, 2021

RainerKuemmerle commented Dec 28, 2021

VRichardJP commented Dec 15, 2021 •

edited

Loading

VRichardJP commented Dec 15, 2021 •

edited

Loading

corot left a comment •

edited

Loading

VRichardJP commented Dec 16, 2021 •

edited

Loading