Improve the performance of computing sin and cos functions #234

jcmonteiro · 2020-10-10T13:49:38Z

Once again, this PR affects quite a lot of files, but the change is actually not that big; besides, it is optional. 🤓

TL;DR

For me, this PR saves 40% CPU without any noticeable impact on convergence. The maximum absolute error between the analytic sin and cos functions and the approximations is 0.003.

Why

In the application I work on, the major performance bottleneck we faced when using TEB was computing three simple functions: sin, cos, and pow. Initially, that came as a surprise for me, but it does make sense since these functions must be used all over to compute the graph edges.

Summary

There are basically two ways of speeding up the computation of trigonometric functions: look-up tables and mathematical approximations. I did not benchmark any of the other ways, but I am confident the approximation I used is more than enough to disappear with the performance bottleneck due to the trigonometric functions.

Comparison

As an example, this is the typical flame graph I get without the trigonometric approximations (sorry, but I had to blur the image where proprietary code is called).

And this one is with approximations for sin and cos.

So, for me, the improvement was roughly 40% less CPU usage.

Parameters

use_sin_cos_approximation under a new tab called "Performance". This toggles the approximations and defaults to false.

Formula

It is a pretty simple approximation, I take a second-order approximation in the 0 to 45 degrees interval, and then use trigonometric identities to replicate the result to the other quadrants. For comparison, these are the analytic functions and their approximations.

And these are the absolute errors.

jcmonteiro · 2020-10-10T13:56:23Z

The description was already too big, so I decided to add this as a comment. If you find this PR useful, I have one other change where I limit the cost exponent to be an integer in the [1, 4]. Doing so, I get an extra 10% performance improvement, totaling 50%.

This is the flamegraph with the changes to sin, cos, and pow.

amakarow · 2020-10-17T15:34:07Z

This is awesome, thanks a lot. We will test this soon.

jcmonteiro · 2020-10-23T10:12:01Z

I just noticed that there are some other places that use sin and cos that were outside my use case. I'll update the PR to include them later.

…ll footprints

jcmonteiro · 2020-11-09T18:40:29Z

Sorry for taking so long to get back to this PR. I have exhaustively tested the changes and decided to enable sin and cos approximations only for computing the distance to footprint models, thus force-pushing to the branch. As it turns out, all the performance bottleneck is in these computations, and enabling approximations to the edges hindered convergence in some edge cases (no pun intended).

RainerKuemmerle · 2020-11-14T10:56:12Z

Just commenting here as a user of TEB.
IMHO would be nice to keep the same API as the sin/cos provided by math.h/cmath. This would also limit the changes in other parts of the code and is easier to maintain than the function writing the result into a reference.
Moreover, can one limit code duplication between sin/cos by exploiting the phase shift of pi/2 between sine and cosine?
How would the approximation suggested here perform with others, e.g. https://github.com/kennyalive/fast-sine-cosine/blob/master/src/main.cpp (random one found via Google).

jcmonteiro · 2020-11-16T09:04:58Z

IMHO would be nice to keep the same API as the sin/cos provided by math.h/cmath. This would also limit the changes in other parts of the code and is easier to maintain than the function writing the result into a reference.

Yes and no. There is a benefit in computing sin and cos at the same time, so I must take them as references. I then used the same interface for computing only the sine just for consistency.

Moreover, can one limit code duplication between sin/cos by exploiting the phase shift of pi/2 between sine and cosine?

What do you have in mind?

How would the approximation suggested here perform with others, e.g. https://github.com/kennyalive/fast-sine-cosine/blob/master/src/main.cpp (random one found via Google).

I have not benchmarked since the change cleared the performance bottleneck altogether.

RainerKuemmerle · 2020-11-16T11:55:52Z

IMHO would be nice to keep the same API as the sin/cos provided by math.h/cmath. This would also limit the changes in other parts of the code and is easier to maintain than the function writing the result into a reference.

Yes and no. There is a benefit in computing sin and cos at the same time, so I must take them as references. I then used the same interface for computing only the sine just for consistency.

To me the API is not comfortable and very prone to errors.

Moreover, can one limit code duplication between sin/cos by exploiting the phase shift of pi/2 between sine and cosine?

What do you have in mind?

But this does maybe not apply to your approach. Just if you have them as functions returning the sine/cosine one can spare lines of code.

How would the approximation suggested here perform with others, e.g. https://github.com/kennyalive/fast-sine-cosine/blob/master/src/main.cpp (random one found via Google).

I have not benchmarked since the change cleared the performance bottleneck altogether.

Just reasoning on which approximation to use. For instance, the implementation linked above seems very compact - also with respect to the number of operations performed. But not sure on the performance.

jcmonteiro · 2020-11-18T14:43:52Z

Just reasoning on which approximation to use. For instance, the implementation linked above seems very compact - also with respect to the number of operations performed. But not sure on the performance.

It is really compact. Thanks for the reference btw. I have tested here the precision and it is better, the maximum error is 0.001 using the approximation you suggested compared to 0.003 with the one I implemented. Besides, the error is zero at zero, which I assume is desirable for numerical conditioning. I'll compare the performance just for the sake of it, but it should be equally good (if not better).

jcmonteiro · 2020-11-29T16:12:54Z

I have changed the approximation method to the one suggested by @RainerKuemmerle. Although the maximum absolute approximation error is smaller, I've noticed a more pronounced convergence deterioration if using the approximation in the edges. Since I have disabled this and the PR uses the approximations only for distance calculations, I think it doesn't really matter.

Besides, the API is the same as the one in the standard library, as pointed out by @RainerKuemmerle.

amakarow · 2021-01-04T16:06:03Z

Have you experienced any overhead when using your wrapper for std::sin and std::cos compared to the plain implementation?

jcmonteiro · 2021-01-05T07:39:55Z

The overhead is unnoticeable given that the other methods called while solving the optimization are much more expensive. Profiling shows no difference as well.

jcmonteiro · 2021-11-29T08:14:16Z

I forgot about this one for a while. Would this PR be interesting? Should I consider modifications to improve it (besides resolving the conflicts)?

corot · 2021-11-29T09:21:43Z

I forgot about this one for a while. Would this PR be interesting? Should I consider modifications to improve it (besides resolving the conflicts)?

If it delivers the performance improvement you describe, yes, definitely is interesting.
Can you first resolve the conflicts, so I can give a try and assess the performance gain?

…ove-sin-cos-performance

corot · 2021-11-30T02:02:20Z

include/teb_local_planner/teb_config.h

@@ -376,6 +383,9 @@ class TebConfig
    recovery.oscillation_recovery_min_duration = 10;
    recovery.oscillation_filter_duration = 10;

+    // Recovery


Suggested change

// Recovery

// Performance

corot · 2021-11-30T02:21:06Z

cfg/TebLocalPlannerReconfigure.cfg

+grp_performance.add(
+	"use_sin_cos_approximation",
+	bool_t,
+	0,
+	"Use sin and cos approximations to improve performance. The maximum absolute error for these approximations is 1e-3.",
+	False
+)


keep same format as for other params

corot · 2021-11-30T02:27:09Z

I didn't notice any speedup on computeVelocityCommands methods. Those are the times for 100 consecutive calls w/o and w/ fast sin/cos doing the same route:

std:
0.425189
0.716500
0.751879
0.404824

fast
0.426448
0.613000
0.681991
0.406975

Neither I notice any reduction on CPU usage, but I did that with top, what obviously is not at all a sound method.

Where the speedup gets reflected?
Also there are still many usages of std::sin/cos; I wounder what the speedup can be if we replace all of them with the fast versions. What about making SIN / COS macros that use either std or fast version depending on a compilation flag and use them all along?

PS: sorry, I broke the PR again with the latest merge 😞

jcmonteiro · 2021-12-01T07:42:04Z

I didn't notice any speedup on computeVelocityCommands methods. Those are the times for 100 consecutive calls w/o and w/ fast sin/cos doing the same route:

std: 0.425189 0.716500 0.751879 0.404824

fast 0.426448 0.613000 0.681991 0.406975

Neither I notice any reduction on CPU usage, but I did that with top, what obviously is not at all a sound method.

If I remember it correctly, the computeVelocityCommands was called thousands of times per second in my application because we handled every single cell in the costmap as an obstacle (did not use costmap converters).

Where the speedup gets reflected? Also there are still many usages of std::sin/cos; I wounder what the speedup can be if we replace all of them with the fast versions. What about making SIN / COS macros that use either std or fast version depending on a compilation flag and use them all along?

I have left the changes out of the edge calculations because adding them there caused the optimization to oscillate. The macro is also an idea but it would

prevent changing in realtime,
make it impossible to use the features for those not compiling from source.

PS: sorry, I broke the PR again with the latest merge disappointed

That's alright =). I will see if I can make the changes asap. The full disclaimer is that I am not compiling or running since my company uses ROS2 now.

rdeniza mentioned this pull request Oct 19, 2020

Foxy devel new features logivations/teb_local_planner#6

Merged

SteveMacenski mentioned this pull request Oct 26, 2020

[ROS2] Sync ROS2 branch with PRs for ROS1 #240

Closed

João Carlos Espiúca Monteiro and others added 2 commits November 9, 2020 19:35

use sin and cos approximations to speed-up the algorithm

a2f4284

make optional the usage of approximations and add support for it on a…

abbdbee

…ll footprints

jcmonteiro force-pushed the feature/improve-sin-cos-performance branch from 6cd9cc7 to abbdbee Compare November 9, 2020 18:37

change the sin and cos approximation methods

433df5a

corot self-requested a review November 26, 2021 11:59

Merge remote-tracking branch 'origin/melodic-devel' into feature/impr…

6dd3639

…ove-sin-cos-performance

jcmonteiro force-pushed the feature/improve-sin-cos-performance branch from 9dd56fe to 6dd3639 Compare November 29, 2021 10:26

corot reviewed Nov 30, 2021

View reviewed changes

VRichardJP mentioned this pull request Dec 14, 2021

Improve performance by precomputing sincos values #340

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the performance of computing sin and cos functions #234

Improve the performance of computing sin and cos functions #234

jcmonteiro commented Oct 10, 2020

jcmonteiro commented Oct 10, 2020

amakarow commented Oct 17, 2020

jcmonteiro commented Oct 23, 2020 •

edited

Loading

jcmonteiro commented Nov 9, 2020 •

edited

Loading

RainerKuemmerle commented Nov 14, 2020

jcmonteiro commented Nov 16, 2020

RainerKuemmerle commented Nov 16, 2020

jcmonteiro commented Nov 18, 2020

jcmonteiro commented Nov 29, 2020

amakarow commented Jan 4, 2021

jcmonteiro commented Jan 5, 2021

jcmonteiro commented Nov 29, 2021

corot commented Nov 29, 2021

corot Nov 30, 2021

corot Nov 30, 2021

corot commented Nov 30, 2021 •

edited

Loading

jcmonteiro commented Dec 1, 2021

Improve the performance of computing sin and cos functions #234

Are you sure you want to change the base?

Improve the performance of computing sin and cos functions #234

Conversation

jcmonteiro commented Oct 10, 2020

TL;DR

Why

Summary

Comparison

Parameters

Formula

jcmonteiro commented Oct 10, 2020

amakarow commented Oct 17, 2020

jcmonteiro commented Oct 23, 2020 • edited Loading

jcmonteiro commented Nov 9, 2020 • edited Loading

RainerKuemmerle commented Nov 14, 2020

jcmonteiro commented Nov 16, 2020

RainerKuemmerle commented Nov 16, 2020

jcmonteiro commented Nov 18, 2020

jcmonteiro commented Nov 29, 2020

amakarow commented Jan 4, 2021

jcmonteiro commented Jan 5, 2021

jcmonteiro commented Nov 29, 2021

corot commented Nov 29, 2021

corot Nov 30, 2021

Choose a reason for hiding this comment

corot Nov 30, 2021

Choose a reason for hiding this comment

corot commented Nov 30, 2021 • edited Loading

jcmonteiro commented Dec 1, 2021

jcmonteiro commented Oct 23, 2020 •

edited

Loading

jcmonteiro commented Nov 9, 2020 •

edited

Loading

corot commented Nov 30, 2021 •

edited

Loading