-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenTelemetry integration #699
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for opal-docs canceled.
|
…ck latency of data update events feat(prometheus_metrics.py): create data_update_latency histogram to monitor latency of data update events
…l into prometheus_integration
…etrics to use opal_server.metrics.prometheus_metrics for better organization chore(requirements.txt): add prometheus_client to dependencies for metrics tracking functionality
…c to track updates per topic feat(prometheus_metrics.py): introduce data_update_count_per_topic counter for monitoring data updates by topic
… to enhance observability fix(api.py): increment policy bundle request count and measure latency for bundle generation fix(callbacks.py): observe size of changed directories in policy update notifications fix(task.py): track policy update count and latency when triggering policy watcher
Hey @psardana, thank you for this contribution! 💎 Can you please add documentation about the metrics and explain how to set it up? Notice that there are conflicts against the main branch, please make sure to rebase from master. Looking forward for this! 🙏 |
Thank you for the review! I have added commits for documentation, docker compose and fixed the label names. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good! 🌟
I've left some comments about specific areas and improvements.
Upon review, some instrumented parts appear to align more with tracing rather than pure metrics. This led to some unnatural metrics and duplications, having separate latency
- count
and error
metrics.
To address this, we suggest exploring OpenTelemetry, which offers native Prometheus integration alongside robust tracing capabilities.
Here's a proposed mapping of the current metrics to OpenTelemetry:
opal_server_data_update
-> Traceopal_server_policy_update
-> Traceopal_server_policy_bundle_request
-> Traceopal_server_policy_bundle_size
-> Metricopal_server_active_clients
-> Metricopal_client_data_subscriptions
-> Metricopal_client_data_update_trigger
-> Traceopal_client_data_update_apply
-> Trace (new)opal_client_policy_update_apply
-> Trace (new)opal_client_policy_store_status
-> Metric
We believe this approach will provide a more comprehensive observability solution.
Please let us know your thoughts, and let's work together to enhance OPAL's observability 💎
docker/prometheus/prometheus/docker-compose-with-prometheus-metrics.yml
Outdated
Show resolved
Hide resolved
|
||
@app.get("/metrics", include_in_schema=False) | ||
async def metrics(): | ||
"""Endpoint to expose Prometheus metrics.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add authentication for this endpoint and add an ability to disable it completely. Making it an opt-in feature would be best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is where I disagree, metrics endpoints usually are not protected. Prometheus needs direct and unauthenticated access to scrape metrics. One example would keycloak, metrics are also exposed directly/unauthenticated
packages/opal-common/opal_common/monitoring/prometheus_metrics.py
Outdated
Show resolved
Hide resolved
packages/opal-common/opal_common/monitoring/prometheus_metrics.py
Outdated
Show resolved
Hide resolved
|
||
@app.get("/metrics", include_in_schema=False) | ||
def metrics(): | ||
"""Endpoint to expose Prometheus metrics.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, we need to add authentication for this endpoint and add an ability to disable it completely.
Fixes Issue
closes #701
Changes proposed
Check List (Check all the applicable boxes)
Screenshots
Note to reviewers