Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add monitoring virtualservices for alertmanager / prometheus #977

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

joelmccoy
Copy link
Contributor

@joelmccoy joelmccoy commented Nov 3, 2024

Description

  • Adds virtualservice on admin gateway for prometheus metrics.uds.dev
  • Adds virtualservice on admin gateway for alertmanager alerts.uds.dev
  • Added authservice to above endpoints
  • reordered k3d-standard packages so that authservice is deployed after keycloak
  • added virtual services and authz policies to allow internal traffic for prometheus / alertmanager

Related Issue

Fixes #967

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Other (security config, docs update, etc)

Checklist before merging

@joelmccoy joelmccoy requested a review from a team as a code owner November 3, 2024 02:41
@@ -91,12 +97,6 @@ components:
import:
path: ../monitoring

# Authservice
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this up the list in the standard bundle as it should be deployed after keycloak. and tests were failing as monitoring gets deployed before authservice if left alone.

- service: prometheus-operated
selector:
app: prometheus
host: prom
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open to other names here 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we have grafana and neuvector, might suggest we keep with that pattern and just use the full product name here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(alternatively we could try and lean more into functionality based naming like sso is, so alerts and metrics?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like alerts and metrics. It just provides a better UX. Switched to that

@joelmccoy
Copy link
Contributor Author

Note: I originally tried to put authservice in front of these things, but it prevents grafana from pulling from prometheus, and prevents prometheus from sending alerts to alertmanager :/. Hopefully it is just ok to put these on the admin gateway, but if not, what we might have to do is create an extra service, expose this, and put authservice in front of it (while still allowing the old service to be reached without authservice).

Copy link
Contributor

@mjnagel mjnagel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely would prefer to put these behind authservice if possible. I know historically when working on Big Bang we were able to use authz policies to allow specific traffic, but there may have been some other caveats with that. cc @bburky if you have thoughts on how to enable this (basically looking to protect prometheus/alertmanager with authservice but also ensure services are able to communicate internal to the cluster still as expected).

- service: prometheus-operated
selector:
app: prometheus
host: prom
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we have grafana and neuvector, might suggest we keep with that pattern and just use the full product name here.

@joelmccoy joelmccoy marked this pull request as draft November 9, 2024 19:08
@joelmccoy
Copy link
Contributor Author

I definitely would prefer to put these behind authservice if possible. I know historically when working on Big Bang we were able to use authz policies to allow specific traffic, but there may have been some other caveats with that. cc @bburky if you have thoughts on how to enable this (basically looking to protect prometheus/alertmanager with authservice but also ensure services are able to communicate internal to the cluster still as expected).

Happy the wire up the work needed to make this happen. If we wanted to go down the authorizationpolicy route I think we might need to make changes to the authorizationpolicy created when you enable authservice in a UDS package??

Here would be an example authorizationpolicy generated by enabling authservice for prometheus in a UDS Package:

Name:         uds-prometheus-authservice
Namespace:    monitoring
API Version:  security.istio.io/v1
Kind:         AuthorizationPolicy
Spec:
  Action:  CUSTOM
  Provider:
    Name:  authservice
  Rules:
    When:
      Key:  request.headers[authorization]
      Not Values:
        *
  Selector:
    Match Labels:
      app.kubernetes.io/name:  prometheus

I think this default policy is going to intercept all traffic to the prometheus pods as CUSTOM rules take precedence over any ALLOW rules (https://istio.io/latest/docs/reference/config/security/authorization-policy). And the goal would be to explicitly allow traffic (and bypass authservice) from certain sources.

I am understanding this correctly? Or is there a simpler way to explicitly allow traffic with an authz policy that doesn't mess with the generated ones for authservice.

@joelmccoy
Copy link
Contributor Author

joelmccoy commented Nov 16, 2024

@mjnagel Got this working for prometheus by adding the virtualservice that adds the header authorization: 'internal-traffic' to traffic routed through it. I tested that grafana was still able to reach prometheus. Currently only allowing traffic from monitoring / grafana namespaces for prometheus.

However, for alertmanager, I am having issues because the way prometheus talks to alertmanager.

It seems to do some sort of service discovery and pushes to alertmanager via ip instead of hostname (and seems to miss the virtual service that would add the authorization header).

This is evident from the promethues pod logs:

prometheus ts=2024-11-16T21:20:34.705Z caller=notifier.go:612 level=error component=notifier alertmanager=https://10.42.0.50:9093/api/v2/alerts count=2 msg="Error sending alert" err="bad response status 403 Forbidden"

It looks like prometheus alerting endpoints don't have the option to just specify a hostname. It looks up the alertmanager CRD and resolves an IP based on that. Not noticing anything at first glance, but hoping there is a way to have prometheus use the hostname in it's requests to alertmanager.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose Prometheus Endpoint and Alertmanager Service on Admin Gateway
2 participants