Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: otel tracing #899

Merged
merged 85 commits into from
Sep 12, 2024
Merged

feat: otel tracing #899

merged 85 commits into from
Sep 12, 2024

Conversation

JordanSussman
Copy link
Collaborator

@JordanSussman JordanSussman commented Jun 29, 2023

this PR adds opt-in OpenTelemetry tracing to the server, often referred to as "OTel".

the main difference is the introduction of the tracing package which implements a new set of env flags to enable OTel tracing across (nearly) all of the server's executable code.

New Flags

Key Description Default Value
VELA_OTEL_TRACING_ENABLE enables opt-in tracing for the server false
VELA_OTEL_TRACING_SERVICE_NAME defines the 'service name' attached to all traces vela-server
VELA_OTEL_EXPORTER_OTLP_ENDPOINT defines the fullpath url for using tracing with the jaeger all-in-one http://jaeger:4318
VELA_OTEL_TRACING_EXPORTER_SSL_CERT_PATH defines the filepath to find certs used to forward traces over https, when not supplied, the traces are forwarded insecurely over http NONE
VELA_OTEL_TRACING_TLS_MIN_VERSION optional TLS minimum version requirement to set when communicating with the otel exporter 1.2
VELA_OTEL_TRACING_RESOURCE_ATTRIBUTES defines the static resource attributes attached to all traces, supplied as a list of key1=<value>,key2=<value> pairs "process.runtime.name=go"
VELA_OTEL_TRACING_RESOURCE_ENV_ATTRIBUTES defines the resource attributes attached to all traces that are fetched from the environment at runtime, supplied as a list of key1=<env_key>,key2=<env_key> pairs NONE
VELA_OTEL_TRACING_SPAN_ATTRIBUTES set otel tracestate attributes as a list of key1=<value>,key2=<value> pairs NONE
VELA_OTEL_TRACING_TRACESTATE_ATTRIBUTES set otel tracestate (span) attributes as a list of key1=<value>,key2=<value> pairs NONE
VELA_OTEL_TRACING_SAMPLER_RATELIMIT_PER_SECOND defines the rate at which traces can be recorded and forwarded to the collector 100

the sampler resource/span attributes offer the flexibility to use restrictive collectors that key off particular fields.

Instrumentation Libraries

the code takes advantage of official opentelemetry instrumentation libraries such as otelgin and otelgorm to automatically create detailed trace payloads using Go's built in context system.

Propagating context.Context

meaning, as long as the ctx context.Context is used properly, spans will automatically be generated and attached to the parent trace that initiated the code execution.

this means, if we properly pass along context as the first parameter of each function then the instrumentation libraries will magically track the timed execution of these functions. then, if you want additional information attached to spans, you would use SpanFromContext(ctx) to retrieve the current span, then you can manipulate the attributes attached to it.

Trace Sampler Configurations

The tracing package allows admins to use env flags to customize the way that Vela samples traces. "Sampling" can be done at the beginning (head sampling) or at the end (tail sampling). For now, Vela samples based on a shared rate limit algorithm using VELA_OTEL_TRACING_SAMPLER_RATELIMIT_PER_SECOND

In the future, I would like to add more head sampling algorithms like "TraceID Ratio", and a way to ignore no-op endpoints such as /health.

Example Env Configuration

# opt-in
VELA_OTEL_TRACING_ENABLE=true

# exporter/collector
VELA_OTEL_EXPORTER_OTLP_ENDPOINT="https://jaeger:6418"
VELA_OTEL_TRACING_EXPORTER_SSL_CERT_PATH="/etc/ssl/certs/ca-certificates.crt"
VELA_OTEL_TRACING_TLS_MIN_VERSION="1.2"

# sampler
VELA_OTEL_TRACING_SAMPLER_RATELIMIT_PER_SECOND=0.2

# trace metadata
VELA_OTEL_TRACING_RESOURCE_ATTRIBUTES: 'resource_attr_key=span_attr_val'
VELA_OTEL_TRACING_RESOURCE_ENV_ATTRIBUTES: 'vela.user-refresh-dur=VELA_USER_REFRESH_TOKEN_DURATION'
VELA_OTEL_TRACING_SPAN_ATTRIBUTES: 'span_attr_key=span_attr_val'
VELA_OTEL_TRACING_TRACESTATE_ATTRIBUTES: 'sometracestate=somevalue'

Example Trace Result

by visiting <ui>/<org>/builds it results in a trace that looks like this in Jaeger (http://localhost:16686/trace/0cd1273b6f73a7dd3bdf84a3d28c43d7)

Screenshot 2024-09-11 at 2 17 42 PM

tracing/config.go Outdated Show resolved Hide resolved
tracing/config.go Outdated Show resolved Hide resolved
tracing/tracer.go Outdated Show resolved Hide resolved
tracing/config.go Show resolved Hide resolved
tracing/flags.go Show resolved Hide resolved
tracing/tracer.go Show resolved Hide resolved
scm/github/repo.go Outdated Show resolved Hide resolved
api/user/get_source.go Outdated Show resolved Hide resolved
tracing/config.go Outdated Show resolved Hide resolved
tracing/tracer.go Show resolved Hide resolved
@plyr4 plyr4 changed the title Feat/otel tracing feat: otel tracing Jul 6, 2023
@codecov
Copy link

codecov bot commented Aug 16, 2023

Codecov Report

Attention: Patch coverage is 19.33962% with 171 lines in your changes missing coverage. Please review.

Project coverage is 52.44%. Comparing base (b6e5d75) to head (1e20dd9).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
tracing/tracer.go 0.00% 64 Missing ⚠️
tracing/config.go 0.00% 39 Missing ⚠️
tracing/sampler.go 0.00% 23 Missing ⚠️
cmd/vela-server/server.go 0.00% 12 Missing ⚠️
api/webhook/post.go 0.00% 11 Missing ⚠️
database/database.go 10.00% 8 Missing and 1 partial ⚠️
scm/github/github.go 25.00% 5 Missing and 1 partial ⚠️
cmd/vela-server/scm.go 0.00% 2 Missing ⚠️
api/build/approve.go 0.00% 1 Missing ⚠️
api/build/create.go 0.00% 1 Missing ⚠️
... and 3 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #899      +/-   ##
==========================================
- Coverage   52.79%   52.44%   -0.35%     
==========================================
  Files         551      557       +6     
  Lines       19221    19421     +200     
==========================================
+ Hits        10147    10185      +38     
- Misses       8510     8670     +160     
- Partials      564      566       +2     
Files with missing lines Coverage Δ
database/context.go 92.59% <100.00%> (+0.28%) ⬆️
database/opts.go 92.85% <100.00%> (+0.54%) ⬆️
router/middleware/tracing.go 100.00% <100.00%> (ø)
router/middleware/tracing/context.go 100.00% <100.00%> (ø)
router/middleware/tracing/tracing.go 100.00% <100.00%> (ø)
scm/github/opts.go 100.00% <100.00%> (ø)
scm/setup.go 100.00% <100.00%> (ø)
api/build/approve.go 0.00% <0.00%> (ø)
api/build/create.go 0.00% <0.00%> (ø)
api/build/restart.go 0.00% <0.00%> (ø)
... and 10 more

ecrupper
ecrupper previously approved these changes Sep 12, 2024
Copy link
Member

@wass3rw3rk wass3rw3rk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@plyr4 plyr4 merged commit bceb069 into main Sep 12, 2024
14 of 16 checks passed
@plyr4 plyr4 deleted the feat/otel-tracing branch September 12, 2024 18:46
@plyr4 plyr4 self-assigned this Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants