Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple prometheus stages with different URLs #532

Merged
merged 1 commit into from
Nov 16, 2023

Conversation

KalmanMeth
Copy link
Collaborator

@KalmanMeth KalmanMeth commented Oct 30, 2023

Description

At present, all prometheus metrics go to the same port defined globally (:9102).
This PR allows to have multiple prometheus encode stages, each one reporting its metrics via a different port (and/or address).
Default behavior is to continue to report metrics via the globally defined port.
Existing configurations should continue to work without change.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Will this change affect NetObserv / Network Observability operator? No.
    If not, you can ignore the rest of this checklist.
  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

@openshift-ci
Copy link

openshift-ci bot commented Oct 30, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from kalmanmeth. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov
Copy link

codecov bot commented Oct 30, 2023

Codecov Report

Attention: 20 lines in your changes are missing coverage. Please review.

Comparison is base (06ac688) 66.98% compared to head (28a06bc) 66.84%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #532      +/-   ##
==========================================
- Coverage   66.98%   66.84%   -0.14%     
==========================================
  Files          95       95              
  Lines        6779     6802      +23     
==========================================
+ Hits         4541     4547       +6     
- Misses       1973     1988      +15     
- Partials      265      267       +2     
Flag Coverage Δ
unittests 66.84% <37.50%> (-0.14%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
pkg/api/encode_prom.go 77.77% <ø> (ø)
pkg/confgen/flowlogs2metrics_config.go 68.83% <100.00%> (ø)
pkg/config/config.go 78.78% <ø> (ø)
cmd/flowlogs-pipeline/main.go 48.93% <0.00%> (ø)
pkg/pipeline/utils/prom_server.go 0.00% <0.00%> (ø)
pkg/pipeline/encode/encode_prom.go 73.66% <40.00%> (-4.27%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 38 to 44
h := promhttp.HandlerFor(reg, promhttp.HandlerOpts{})
server.Handler = h
// TODO: This needs more work. enable different endpoints for different handlers, and register them on distinct ServeMux
if !handlerRegistered {
http.Handle("/metrics", promhttp.Handler())
handlerRegistered = true
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @KalmanMeth ,
I think this should work to address your comment:

Suggested change
h := promhttp.HandlerFor(reg, promhttp.HandlerOpts{})
server.Handler = h
// TODO: This needs more work. enable different endpoints for different handlers, and register them on distinct ServeMux
if !handlerRegistered {
http.Handle("/metrics", promhttp.Handler())
handlerRegistered = true
}
mux := http.NewServeMux()
mux.Handle("/metrics", promhttp.HandlerFor(reg, promhttp.HandlerOpts{}))
server.Handler = mux

@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Nov 13, 2023
Copy link

New image:
quay.io/netobserv/flowlogs-pipeline:95b87c7

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=95b87c7 make set-flp-image

@jotak
Copy link
Member

jotak commented Nov 13, 2023

/lgtm
I did some regression tests / verified that it doesn't introduce any breaking change.
The e2e test currently fails, not sure if that has anything to do with this PR

@openshift-ci openshift-ci bot added the lgtm label Nov 13, 2023
@KalmanMeth KalmanMeth requested a review from jotak November 14, 2023 07:50
@KalmanMeth KalmanMeth merged commit a9fc9ce into netobserv:main Nov 16, 2023
6 of 7 checks passed
@jotak
Copy link
Member

jotak commented Nov 16, 2023

@KalmanMeth could you see what's wrong with the e2e test? it kept failing, hitting timeout after deploying FLP, but I didn't see any more accurate error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm ok-to-test To set manually when a PR is safe to test. Triggers image build on PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants