-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[APM] Internal Server Error returns on apm/traces/aggregated_critical request #178892
Comments
Pinging @elastic/apm-ui (Team:APM) |
Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services) |
I wonder if this is unique to serverless or if the same challenge exists in ESS? Can we determine this? As a Tech Preview feature, we can prioritize this investigation lower than GA features like Service Map |
@chrisdistasio for context, the feature is built on a scripted_metric aggregation which we kind of expect to break down in some cases. The issues are similar to the service map, in the sense that we need to look at trace events and cannot use aggregated metrics, and thus its performance characteristics become unpredictable. We can build in some safe guards to get it to GA though. Happy to help out if needed. |
@chrisdistasio issue is not unique to serverless, just reproduced it stateful deployment too. |
Hey @dgieselaar what guardrails you have in mind for this? |
This might be related to #181790 |
Reiterating the points made above, Traces Explorer is in Tech Preview and this affects both serverless and stateful - as such, this is a lower priority. |
@paulb-elastic FWIW, the concerns around service maps were not "this breaks for our users", but "this can take down a cluster and page the ES team and it isn't their responsibility". I think the same applies here, but the risk is lower due the fact it is not enabled by default. I assume the ES team still wants a fix for this as well though. |
@chrisdistasio I think @dgieselaar 's comment above answers your question from earlier about whether or not there is currently a mechanism in elasticsearch to guard itself against being taken down when handling our requests and that we need to build for that on our end. |
I can't really think of an alternative other than rewriting the query not to use Perhaps the fact that it still in technical preview it also gives us more flexibility to rewrite this feature to not use scripted_metrics aggs, provided that there will be performance gains in doing so. |
I'm constantly experiencing another issue where, for the same time range, sometimes the data is returned by the server, while other times it keeps loading forever or returns empty. |
As per the discussion with @chrisdistasio, this won't be tackled right now, but moved to the backlog |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Version:
Serverless project v 8.14.0
Stateful deployment v 8.14.0-SNAPSHOT
Description:
POST /internal/apm/traces/aggregated_critical_path
request returnInternal Server Error
.Preconditions:
I reproduced the issue having ~780k documents in APM data view within 15 minutes interval from 761 services.
Steps to reproduce:
Expected behavior:
Data presentation should be rendered.
Screenshots:
chrome_4gR3XPH3Az.mp4
Response:
The text was updated successfully, but these errors were encountered: