You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 3, 2023. It is now read-only.
once a bundle is full, a new bundle is created. there can be an infinite number of bundles held in memory (possible if the state is disconnected for a long time).
Hence, if the collector is not reachable for a long enough time, the bundler causes the application's memory to explode, especially if traces are being sampled at a large enough rate. This further leads to the application being down and not being able to serve production traffic.
What version of the Exporter are you using?
version = "v0.5.0"
What version of OpenCensus are you using?
version = "v0.19.0"
What version of Go are you using?
go1.11.5 darwin/amd64
What did you do?
Consider the below scenario:
An application A is instrumenting its traces using the ocagent-exporter and sending those traces via an
oc-collector
to a backend.uploadTraces
, will be called on the bundle to offload the bundled traces to the collector (see https://github.com/googleapis/google-api-go-client/blob/c75846e6b94d2eded794529e4016d3d19ae6eeb1/support/bundler/bundler.go#L114). the handler function is called once the bundle hasBundleCountThreshold
items (see https://github.com/googleapis/google-api-go-client/blob/c75846e6b94d2eded794529e4016d3d19ae6eeb1/support/bundler/bundler.go#L57-L61). this is set to 300, i.e. the size of the bundle, inocagent
:opencensus-go-exporter-ocagent/ocagent.go
Line 118 in bbad334
uploadTraces
and the bundle stays in memory:opencensus-go-exporter-ocagent/ocagent.go
Lines 442 to 444 in bbad334
Hence, if the collector is not reachable for a long enough time, the bundler causes the application's memory to explode, especially if traces are being sampled at a large enough rate. This further leads to the application being down and not being able to serve production traffic.
We have a hunch that this relates to census-instrumentation/opencensus-service#524 as well.
What did you expect to see?
There should be a mechanism by which the unsent bundles should just be dropped if the downstream collector is not able to receive the spans.
What did you see instead?
Huge memory explosion, causing downtime of the application.
Note: Most of the above analysis was done by @elynnyap, I'm just a messenger.
The text was updated successfully, but these errors were encountered: