-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate the Pyroscope agent in the Cassandra/DSE builds to enable continuous profiling #462
Comments
I don't think this provides user anything interesting. What on earth would users do with thread profiling of Cassandra? It doesn't reveal much of useful information even, given how Cassandra is architected. If the user is a Cassandra developer, then perhaps they might get something useful out of it, but not otherwise. |
My experience with diagnosing Cassandra performance issues contradicts this. It is VERY useful. |
Seconded, I've also used flame charts to diagnose performance problems. My only reservation with this is that I think we'd want to have a good understanding of any performance impacts caused by running tracing continuously. It might be more interesting to sample traces periodically. NB: if we had a service mesh we could be examining network traces too, which would possibly be even more useful... |
yeah, the impact of the continuous profiling needs to be evaluated. I guess we can tune the profiling intervals to avoid profiling all the time.
The service mesh is something we should explore to see what benefits we could get out of it (easy TLS orchestration being one) and what it would impose us as drawbacks (higher latencies being one). |
I've done quite a bit of performance work over the last decade, especially with Cassandra, and have found profiling visualized with flame graphs to be by far the most direct and useful method of revealing performance bottlenecks. I did a considerable amount of performance tuning of the Netflix Cassandra fleet, as well as my time consulting at The Last Pickle, and I continue to rely on profiling and flame graphs to this day. I wrote a blog post about using async-profiler with Cassandra and discussed it when I gave the keynote at p99 conf.. Considering Pyroscope uses async-profiler under the hood and async-profiler is a very lightweight sampling profiler, my expectation is the overhead will be minimal, even if run continuously. It generally has < 1% CPU overhead, as it's sampling the java stack traces at regular intervals, and doesn't require waiting for a safe point. |
Flamegraphs are often the best (if not the only) way to properly identify what's causing performance issues in Cassandra.
Grafana Pyroscope is a continuous profiling database which allows displaying flamegraphs in Grafana and would be a great addition to our toolbelt.
We should add the pyroscope java agent to our builds, which we'd disable by default (see the PYROSCOPE_AGENT_ENABLED env variable) and fully configure it through env variables.
┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: MAPI-7
The text was updated successfully, but these errors were encountered: