-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
precompute sin/cos to improve performance #341
base: melodic-devel
Are you sure you want to change the base?
precompute sin/cos to improve performance #341
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks interesting, but just checking the time expended on computeCmdVel method, I get worse results than with current melodic-devel:
0.393857
0.652801
0.827784
0.435621
With PR on same route:
0.593492
1.004650
1.046142
0.564472
Admittedly, is not a very sophisticated test, but I get consistent results.
Can you try to compare the time expended on computeCmdVel method?
@corot I don't include cmd_vel_ in my benchmark so I overlooked it. I guess the issue here is that when PoseSE2 is copied sin/cos gets calculated over and over instead of being copied. For example with the current code, something like: pose1.theta() = pose2.theta(); is equivalent to: pose1.theta()._theta = pose2.theta()._theta;
pose1.theta().update_sincos(); Maybe adding a simple copy constructor would fix the performance in your benchmark: Theta(const Theta& other) {
_rad = other._rad;
_sin = other._sin;
_cos = other._cos;
} That being said, I don't really like to keep implicit Theta->double and double->Theta conversion as Theta comes with an upfront cost. I am considering 2 ways to tackle this:
const double& sin() {
if (!_init) {
update_sincos();
_init = true;
}
return _sin;
}
What do you think? |
I have tested both approach 1 and 2 in my setup. The difference in performance is negligible on my machine: the implementation 1 is ~1% slower compared to the implementation 2, while being way simpler and less prone to bugs. EDIT: I observe with google perftools that with the lazy sincos version, the total number of sincos call is reduced by 2/3 compared to original branch. This bears out my initial observation that sin/cos computation is largely duplicated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better now, but I still get times slightly above melodic-devel. Maybe sin / cos implementation does already some caching? 🤔 (I compile in release, so I assume the implementation of std is very efficient)
0.449948
0.750811
0.723817
0.433760
With PR on same route:
0.477178
0.915454
0.731955
0.506457
Did you measure the time on computeCmdVel?
Or how do you evaluate the performance gain?
me too! I tried to rule out some possibilities:
What's that |
This is the callback I use for benchmarking purpose: teb_local_planner/src/test_optim_node.cpp Lines 165 to 170 in ec5759d
|
Some edges compute their Jacobian numerically. This might also explain why you observe a lot of calls to sin/cos in general. For those edges, caching does not help when evaluating wrt theta but for x/y. For edges with analytic Jacobian, the overhead of caching might lead to negative impact. Maybe benchmark per Edge on computeError/linearizeOplus leads to interestings insights. |
As discussed in #340