You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Impact of the new feature
Data Traces and OpenTelemetry can significantly improve monitoring of individual services as well as its data flow.
Is your feature request related to a problem? Please describe.
Currently debugging individual workflow, tracing and understand errors is very cumbersome within WM eco-system. This issue can improve this by introduce data traces into all WM services and provide better understanding of running workflows.
Describe the solution you'd like
Use OpenTelemetry framework to add Data Traces to all WM services.
Describe alternatives you've considered
None
Additional context
This can be done as simple as following (below is an example of a data services with data traces):
and, then instrument Data Traces into your python application as following:
# Basic Example in Python with OpenTelemetry
from opentelemetry import trace
from opentelemetry.propagate import inject
from opentelemetry.trace import SpanKind
import requests
# Initialize tracer
tracer = trace.get_tracer(__name__)
def send_request_to_service_b():
with tracer.start_as_current_span("service_a_to_b", kind=SpanKind.CLIENT) as span:
# Inject tracing context into HTTP headers
headers = {}
inject(headers)
response = requests.get("http://service-b.local/process", headers=headers)
return response
After traces are instrumented they can be visualized within the following applications:
OpenTelemetry Collector: You can use an OpenTelemetry Collector to gather tracing data and then convert parts of it into metrics, such as latency, request counts, and error rates. The Collector can send these metrics to Prometheus, where they can be scraped and visualized.
Jaeger with Prometheus Metrics: Jaeger is a popular tracing tool that integrates with Prometheus. Jaeger can collect tracing data across services and expose certain trace metrics (like service call counts, latency, and errors) to Prometheus, which can then be scraped. You can configure Jaeger to expose metrics in a format Prometheus understands by using the prometheus-metrics endpoint.
Custom Trace-to-Metrics Exporter: In some cases, you might write a custom exporter that takes key trace information, such as request duration or error counts, and exports it as metrics that Prometheus can consume.
Grafana Tempo Integration: If you’re using Grafana, Tempo can store trace data, and you can integrate it with Prometheus to query metrics and view traces side by side in Grafana dashboards.
Example Setup for OpenTelemetry + Prometheus
Here’s a basic setup that uses OpenTelemetry to generate metrics based on traces and then scrape these with Prometheus.
Install OpenTelemetry Instrumentation in each service to generate trace data.
Set up the OpenTelemetry Collector to receive trace data, transform it into metrics (such as request latency), and export these metrics to Prometheus.
Configure Prometheus to scrape the OpenTelemetry Collector's metrics endpoint.
By converting key trace events into metrics, Prometheus can gain insights into trace-related metrics without directly handling trace data, and tools like Grafana can correlate the metrics with trace spans if both Prometheus and a trace backend (such as Tempo or Jaeger) are set up.
Impact of the new feature
Data Traces and OpenTelemetry can significantly improve monitoring of individual services as well as its data flow.
Is your feature request related to a problem? Please describe.
Currently debugging individual workflow, tracing and understand errors is very cumbersome within WM eco-system. This issue can improve this by introduce data traces into all WM services and provide better understanding of running workflows.
Describe the solution you'd like
Use OpenTelemetry framework to add Data Traces to all WM services.
Describe alternatives you've considered
None
Additional context
This can be done as simple as following (below is an example of a data services with data traces):
and, then instrument Data Traces into your python application as following:
After traces are instrumented they can be visualized within the following applications:
Example Setup for OpenTelemetry + Prometheus
Here’s a basic setup that uses OpenTelemetry to generate metrics based on traces and then scrape these with Prometheus.
By converting key trace events into metrics, Prometheus can gain insights into trace-related metrics without directly handling trace data, and tools like Grafana can correlate the metrics with trace spans if both Prometheus and a trace backend (such as Tempo or Jaeger) are set up.
Please also see USCMS WM Monitoring and Alerts talk for more details.
The text was updated successfully, but these errors were encountered: