forked from open-telemetry/opentelemetry-collector-contrib
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Jared Tan <[email protected]>
- Loading branch information
1 parent
4713864
commit 6852088
Showing
26 changed files
with
2,667 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
include ../../Makefile.Common |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
# Group by Trace connector | ||
<!-- status autogenerated section --> | ||
| Status | | | ||
| ------------- |-----------| | ||
| Distributions | [contrib] | | ||
| Warnings | [Statefulness](#warnings) | | ||
| Issues | [![Open issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aopen%20label%3Aconnector%2Ftraceduration%20&label=open&color=orange&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aopen+is%3Aissue+label%3Aconnector%2Ftraceduration) [![Closed issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aclosed%20label%3Aconnector%2Ftraceduration%20&label=closed&color=blue&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aclosed+is%3Aissue+label%3Aconnector%2Ftraceduration) | | ||
| [Code Owners](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/CONTRIBUTING.md#becoming-a-code-owner) | [@JaredTan95](https://www.github.com/JaredTan95) | | ||
|
||
[alpha]: https://github.com/open-telemetry/opentelemetry-collector#alpha | ||
[contrib]: https://github.com/open-telemetry/opentelemetry-collector-releases/tree/main/distributions/otelcol-contrib | ||
|
||
## Supported Pipeline Types | ||
|
||
| [Exporter Pipeline Type] | [Receiver Pipeline Type] | [Stability Level] | | ||
| ------------------------ | ------------------------ | ----------------- | | ||
| traces | metrics | [alpha] | | ||
| traces | logs | [alpha] | | ||
|
||
[Exporter Pipeline Type]: https://github.com/open-telemetry/opentelemetry-collector/blob/main/connector/README.md#exporter-pipeline-type | ||
[Receiver Pipeline Type]: https://github.com/open-telemetry/opentelemetry-collector/blob/main/connector/README.md#receiver-pipeline-type | ||
[Stability Level]: https://github.com/open-telemetry/opentelemetry-collector#stability-levels | ||
<!-- end autogenerated section --> | ||
|
||
This processor collects all the spans from the same trace, waiting a | ||
pre-determined amount of time before releasing the trace to the next processor. | ||
The expectation is that, generally, traces will be complete after the given time. | ||
|
||
This processor should be used whenever a processor requires grouped traces to make decisions, | ||
such as a tail-based sampler or a per-trace metrics processor. Note that [`tailsamplingprocessor`](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor) | ||
also implements a similar mechanism and can be used independently. | ||
|
||
The batch processor shouldn't be used before this processor, as this one will | ||
probably undo part (or much) of the work that the batch processor performs. It's | ||
fine to have the batch processor to run right after this one, and every entry in the | ||
batch will be a complete trace. | ||
|
||
Please refer to [config.go](./config.go) for the config spec. | ||
|
||
Examples: | ||
|
||
```yaml | ||
processors: | ||
groupbytrace: | ||
groupbytrace/2: | ||
wait_duration: 10s | ||
num_traces: 1000 | ||
num_workers: 2 | ||
``` | ||
## Configuration | ||
Refer to [config.yaml](./testdata/config.yaml) for detailed examples on using the processor. | ||
The `num_traces` (default=1,000,000) property tells the processor what's the maximum number of traces to keep in the internal storage. A higher `num_traces` might incur in a higher memory usage. | ||
|
||
The `wait_duration` (default=1s) property tells the processor for how long it should keep traces in the internal storage. Once a trace is kept for this duration, it's then released to the next consumer and removed from the internal storage. Spans from a trace that has been released will be kept for the entire duration again. | ||
|
||
The `num_workers` (default=1) property controls how many concurrent workers the processor will use to process traces. If you are looking to optimize this value | ||
then using GOMAXPROCS could be considered as a starting point. | ||
|
||
## Metrics | ||
|
||
The following metrics are recorded by this processor: | ||
|
||
* `otelcol_processor_groupbytrace_conf_num_traces` represents the maximum number of traces that can be kept by the internal storage. This value comes from the processor's configuration and will never change over the lifecycle of the processor. | ||
* `otelcol_processor_groupbytrace_event_latency_bucket`, with the following `event` tag values: | ||
* `onTraceReceived` represents the number of traces' parts the processor has received from the previous components | ||
* `onTraceExpired` represents the number of traces that finished waiting in memory for spans to arrive | ||
* `onTraceReleased` represents the number of traces that have been marked as released to the next component | ||
* `onTraceRemoved` represents the number of traces that have been marked for removal from the internal storage | ||
* `otelcol_processor_groupbytrace_num_events_in_queue` representing the state of the internal queue. Ideally, this number would be close to zero, but might have temporary spikes if the storage is slow. | ||
* `otelcol_processor_groupbytrace_num_traces_in_memory` representing the state of the internal trace storage, waiting for spans to arrive. It's common to have items in memory all the time if the processor has a continuous flow of data. The longer the `wait_duration`, the higher the amount of traces in memory should be, given enough traffic. | ||
* `otelcol_processor_groupbytrace_spans_released` and `otelcol_processor_groupbytrace_traces_released` represent the number of spans and traces effectively released to the next component. | ||
* `otelcol_processor_groupbytrace_traces_evicted` represents the number of traces that have been evicted from the internal storage due to capacity problems. Ideally, this should be zero, or very close to zero at all times. If you keep getting items evicted, increase the `num_traces`. | ||
* `otelcol_processor_groupbytrace_incomplete_releases` represents the traces that have been marked as expired, but had been previously been removed. This might be the case when a span from a trace has been received in a batch while the trace existed in the in-memory storage, but has since been released/removed before the span could be added to the trace. This should always be very close to 0, and a high value might indicate a software bug. | ||
|
||
A healthy system would have the same value for the metric `otelcol_processor_groupbytrace_spans_released` and for three events under `otelcol_processor_groupbytrace_event_latency_bucket`: `onTraceExpired`, `onTraceRemoved` and `onTraceReleased`. | ||
|
||
The metric `otelcol_processor_groupbytrace_event_latency_bucket` is a bucket and shows how long each event took to be processed in miliseconds. In most cases, it should take less than 5ms for an event to be processed, but it might be the case where an event could take 10ms. Higher latencies are possible, but it should never really reach the last item, representing 1s. Events taking more than 1s are killed automatically, and if you have multiple items in this bucket, it might indicate a bug in the software. | ||
|
||
Most metrics are updated when the events occur, except for the following ones, which are updated periodically: | ||
* `otelcol_processor_groupbytrace_num_events_in_queue` | ||
* `otelcol_processor_groupbytrace_num_traces_in_memory` | ||
|
||
## Warnings | ||
|
||
- [Statefulness](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/standard-warnings.md#statefulness): The groupbytrace processor's works best when all spans for a trace are sent to the same collector instance. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
package tracedurationconnector // import "github.com/open-telemetry/opentelemetry-collector-contrib/connector/tracedurationconnector" | ||
|
||
import ( | ||
"time" | ||
) | ||
|
||
// Config is the configuration for the processor. | ||
type Config struct { | ||
|
||
// NumTraces is the max number of traces to keep in memory waiting for the duration. | ||
// Default: 1_000_000. | ||
NumTraces int `mapstructure:"num_traces"` | ||
|
||
// NumWorkers is a number of workers processing event queue. | ||
// Default: 1. | ||
NumWorkers int `mapstructure:"num_workers"` | ||
|
||
// WaitDuration tells the processor to wait for the specified duration for the trace to be complete. | ||
// Default: 1s. | ||
WaitDuration time.Duration `mapstructure:"wait_duration"` | ||
|
||
// DiscardOrphans instructs the processor to discard traces without the root span. | ||
// This typically indicates that the trace is incomplete. | ||
// Default: false. | ||
// Not yet implemented, and an error will be returned when this option is used. | ||
DiscardOrphans bool `mapstructure:"discard_orphans"` | ||
|
||
// StoreOnDisk tells the processor to keep only the trace ID in memory, serializing the trace spans to disk. | ||
// Useful when the duration to wait for traces to complete is high. | ||
// Default: false. | ||
// Not yet implemented, and an error will be returned when this option is used. | ||
StoreOnDisk bool `mapstructure:"store_on_disk"` | ||
|
||
Dimensions []Dimension `mapstructure:"dimensions"` | ||
} | ||
|
||
// Dimension defines the dimension name and optional default value if the Dimension is missing from a span attribute. | ||
type Dimension struct { | ||
Name string `mapstructure:"name"` | ||
Default *string `mapstructure:"default"` | ||
} |
Oops, something went wrong.