Skip to content

Commit

Permalink
tracedurationconnector
Browse files Browse the repository at this point in the history
Signed-off-by: Jared Tan <[email protected]>
  • Loading branch information
JaredTan95 committed Oct 9, 2024
1 parent 4713864 commit 6852088
Show file tree
Hide file tree
Showing 26 changed files with 2,667 additions and 0 deletions.
2 changes: 2 additions & 0 deletions cmd/otelcontribcol/builder-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,7 @@ connectors:
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/connector/servicegraphconnector v0.111.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/connector/spanmetricsconnector v0.111.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/connector/slowsqlconnector v0.111.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/connector/tracedurationconnector v0.111.0

providers:
- gomod: go.opentelemetry.io/collector/confmap/provider/envprovider v1.17.0
Expand Down Expand Up @@ -502,4 +503,5 @@ replaces:
- github.com/open-telemetry/opentelemetry-collector-contrib/internal/grpcutil => ../../internal/grpcutil
- github.com/open-telemetry/opentelemetry-collector-contrib/receiver/googlecloudmonitoringreceiver => ../../receiver/googlecloudmonitoringreceiver
- github.com/open-telemetry/opentelemetry-collector-contrib/connector/slowsqlconnector => ../../connector/slowsqlconnector
- github.com/open-telemetry/opentelemetry-collector-contrib/connector/tracedurationconnector => ../../connector/tracedurationconnector

1 change: 1 addition & 0 deletions connector/slowsqlconnector/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ require (
github.com/google/uuid v1.6.0 // indirect
github.com/hashicorp/go-version v1.7.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/klauspost/compress v1.17.9 // indirect
github.com/knadh/koanf/maps v0.1.1 // indirect
github.com/knadh/koanf/providers/confmap v0.1.0 // indirect
github.com/knadh/koanf/v2 v2.1.1 // indirect
Expand Down
1 change: 1 addition & 0 deletions connector/slowsqlconnector/go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions connector/tracedurationconnector/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include ../../Makefile.Common
88 changes: 88 additions & 0 deletions connector/tracedurationconnector/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Group by Trace connector
<!-- status autogenerated section -->
| Status | |
| ------------- |-----------|
| Distributions | [contrib] |
| Warnings | [Statefulness](#warnings) |
| Issues | [![Open issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aopen%20label%3Aconnector%2Ftraceduration%20&label=open&color=orange&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aopen+is%3Aissue+label%3Aconnector%2Ftraceduration) [![Closed issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aclosed%20label%3Aconnector%2Ftraceduration%20&label=closed&color=blue&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aclosed+is%3Aissue+label%3Aconnector%2Ftraceduration) |
| [Code Owners](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/CONTRIBUTING.md#becoming-a-code-owner) | [@JaredTan95](https://www.github.com/JaredTan95) |

[alpha]: https://github.com/open-telemetry/opentelemetry-collector#alpha
[contrib]: https://github.com/open-telemetry/opentelemetry-collector-releases/tree/main/distributions/otelcol-contrib

## Supported Pipeline Types

| [Exporter Pipeline Type] | [Receiver Pipeline Type] | [Stability Level] |
| ------------------------ | ------------------------ | ----------------- |
| traces | metrics | [alpha] |
| traces | logs | [alpha] |

[Exporter Pipeline Type]: https://github.com/open-telemetry/opentelemetry-collector/blob/main/connector/README.md#exporter-pipeline-type
[Receiver Pipeline Type]: https://github.com/open-telemetry/opentelemetry-collector/blob/main/connector/README.md#receiver-pipeline-type
[Stability Level]: https://github.com/open-telemetry/opentelemetry-collector#stability-levels
<!-- end autogenerated section -->

This processor collects all the spans from the same trace, waiting a
pre-determined amount of time before releasing the trace to the next processor.
The expectation is that, generally, traces will be complete after the given time.

This processor should be used whenever a processor requires grouped traces to make decisions,
such as a tail-based sampler or a per-trace metrics processor. Note that [`tailsamplingprocessor`](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor)
also implements a similar mechanism and can be used independently.

The batch processor shouldn't be used before this processor, as this one will
probably undo part (or much) of the work that the batch processor performs. It's
fine to have the batch processor to run right after this one, and every entry in the
batch will be a complete trace.

Please refer to [config.go](./config.go) for the config spec.

Examples:

```yaml
processors:
groupbytrace:
groupbytrace/2:
wait_duration: 10s
num_traces: 1000
num_workers: 2
```
## Configuration
Refer to [config.yaml](./testdata/config.yaml) for detailed examples on using the processor.
The `num_traces` (default=1,000,000) property tells the processor what's the maximum number of traces to keep in the internal storage. A higher `num_traces` might incur in a higher memory usage.

The `wait_duration` (default=1s) property tells the processor for how long it should keep traces in the internal storage. Once a trace is kept for this duration, it's then released to the next consumer and removed from the internal storage. Spans from a trace that has been released will be kept for the entire duration again.

The `num_workers` (default=1) property controls how many concurrent workers the processor will use to process traces. If you are looking to optimize this value
then using GOMAXPROCS could be considered as a starting point.

## Metrics

The following metrics are recorded by this processor:

* `otelcol_processor_groupbytrace_conf_num_traces` represents the maximum number of traces that can be kept by the internal storage. This value comes from the processor's configuration and will never change over the lifecycle of the processor.
* `otelcol_processor_groupbytrace_event_latency_bucket`, with the following `event` tag values:
* `onTraceReceived` represents the number of traces' parts the processor has received from the previous components
* `onTraceExpired` represents the number of traces that finished waiting in memory for spans to arrive
* `onTraceReleased` represents the number of traces that have been marked as released to the next component
* `onTraceRemoved` represents the number of traces that have been marked for removal from the internal storage
* `otelcol_processor_groupbytrace_num_events_in_queue` representing the state of the internal queue. Ideally, this number would be close to zero, but might have temporary spikes if the storage is slow.
* `otelcol_processor_groupbytrace_num_traces_in_memory` representing the state of the internal trace storage, waiting for spans to arrive. It's common to have items in memory all the time if the processor has a continuous flow of data. The longer the `wait_duration`, the higher the amount of traces in memory should be, given enough traffic.
* `otelcol_processor_groupbytrace_spans_released` and `otelcol_processor_groupbytrace_traces_released` represent the number of spans and traces effectively released to the next component.
* `otelcol_processor_groupbytrace_traces_evicted` represents the number of traces that have been evicted from the internal storage due to capacity problems. Ideally, this should be zero, or very close to zero at all times. If you keep getting items evicted, increase the `num_traces`.
* `otelcol_processor_groupbytrace_incomplete_releases` represents the traces that have been marked as expired, but had been previously been removed. This might be the case when a span from a trace has been received in a batch while the trace existed in the in-memory storage, but has since been released/removed before the span could be added to the trace. This should always be very close to 0, and a high value might indicate a software bug.

A healthy system would have the same value for the metric `otelcol_processor_groupbytrace_spans_released` and for three events under `otelcol_processor_groupbytrace_event_latency_bucket`: `onTraceExpired`, `onTraceRemoved` and `onTraceReleased`.

The metric `otelcol_processor_groupbytrace_event_latency_bucket` is a bucket and shows how long each event took to be processed in miliseconds. In most cases, it should take less than 5ms for an event to be processed, but it might be the case where an event could take 10ms. Higher latencies are possible, but it should never really reach the last item, representing 1s. Events taking more than 1s are killed automatically, and if you have multiple items in this bucket, it might indicate a bug in the software.

Most metrics are updated when the events occur, except for the following ones, which are updated periodically:
* `otelcol_processor_groupbytrace_num_events_in_queue`
* `otelcol_processor_groupbytrace_num_traces_in_memory`

## Warnings

- [Statefulness](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/standard-warnings.md#statefulness): The groupbytrace processor's works best when all spans for a trace are sent to the same collector instance.
44 changes: 44 additions & 0 deletions connector/tracedurationconnector/config.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

package tracedurationconnector // import "github.com/open-telemetry/opentelemetry-collector-contrib/connector/tracedurationconnector"

import (
"time"
)

// Config is the configuration for the processor.
type Config struct {

// NumTraces is the max number of traces to keep in memory waiting for the duration.
// Default: 1_000_000.
NumTraces int `mapstructure:"num_traces"`

// NumWorkers is a number of workers processing event queue.
// Default: 1.
NumWorkers int `mapstructure:"num_workers"`

// WaitDuration tells the processor to wait for the specified duration for the trace to be complete.
// Default: 1s.
WaitDuration time.Duration `mapstructure:"wait_duration"`

// DiscardOrphans instructs the processor to discard traces without the root span.
// This typically indicates that the trace is incomplete.
// Default: false.
// Not yet implemented, and an error will be returned when this option is used.
DiscardOrphans bool `mapstructure:"discard_orphans"`

// StoreOnDisk tells the processor to keep only the trace ID in memory, serializing the trace spans to disk.
// Useful when the duration to wait for traces to complete is high.
// Default: false.
// Not yet implemented, and an error will be returned when this option is used.
StoreOnDisk bool `mapstructure:"store_on_disk"`

Dimensions []Dimension `mapstructure:"dimensions"`
}

// Dimension defines the dimension name and optional default value if the Dimension is missing from a span attribute.
type Dimension struct {
Name string `mapstructure:"name"`
Default *string `mapstructure:"default"`
}
Loading

0 comments on commit 6852088

Please sign in to comment.