feat: add monitoring virtualservices for alertmanager / prometheus #977

joelmccoy · 2024-11-03T02:41:07Z

Description

Adds virtualservice on admin gateway for prometheus metrics.uds.dev
Adds virtualservice on admin gateway for alertmanager alerts.uds.dev
Added authservice to above endpoints
reordered k3d-standard packages so that authservice is deployed after keycloak
added virtual services and authz policies to allow internal traffic for prometheus / alertmanager

Related Issue

Fixes #967

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Other (security config, docs update, etc)

Checklist before merging

Test, docs, adr added or updated as needed
Contributor Guide followed

joelmccoy · 2024-11-03T02:44:30Z

packages/standard/zarf.yaml

@@ -91,12 +97,6 @@ components:
    import:
      path: ../monitoring

-  # Authservice


moved this up the list in the standard bundle as it should be deployed after keycloak. and tests were failing as monitoring gets deployed before authservice if left alone.

joelmccoy · 2024-11-03T02:46:07Z

src/prometheus-stack/chart/templates/uds-package.yaml

+      - service: prometheus-operated
+        selector:
+          app: prometheus
+        host: prom


open to other names here 🤷

Currently we have grafana and neuvector, might suggest we keep with that pattern and just use the full product name here.

(alternatively we could try and lean more into functionality based naming like sso is, so alerts and metrics?)

I like alerts and metrics. It just provides a better UX. Switched to that

joelmccoy · 2024-11-03T03:19:52Z

Note: I originally tried to put authservice in front of these things, but it prevents grafana from pulling from prometheus, and prevents prometheus from sending alerts to alertmanager :/. Hopefully it is just ok to put these on the admin gateway, but if not, what we might have to do is create an extra service, expose this, and put authservice in front of it (while still allowing the old service to be reached without authservice).

mjnagel

I definitely would prefer to put these behind authservice if possible. I know historically when working on Big Bang we were able to use authz policies to allow specific traffic, but there may have been some other caveats with that. cc @bburky if you have thoughts on how to enable this (basically looking to protect prometheus/alertmanager with authservice but also ensure services are able to communicate internal to the cluster still as expected).

mjnagel · 2024-11-04T15:51:26Z

src/prometheus-stack/chart/templates/uds-package.yaml

+      - service: prometheus-operated
+        selector:
+          app: prometheus
+        host: prom


Currently we have grafana and neuvector, might suggest we keep with that pattern and just use the full product name here.

joelmccoy · 2024-11-09T20:44:58Z

I definitely would prefer to put these behind authservice if possible. I know historically when working on Big Bang we were able to use authz policies to allow specific traffic, but there may have been some other caveats with that. cc @bburky if you have thoughts on how to enable this (basically looking to protect prometheus/alertmanager with authservice but also ensure services are able to communicate internal to the cluster still as expected).

Happy the wire up the work needed to make this happen. If we wanted to go down the authorizationpolicy route I think we might need to make changes to the authorizationpolicy created when you enable authservice in a UDS package??

Here would be an example authorizationpolicy generated by enabling authservice for prometheus in a UDS Package:

Name:         uds-prometheus-authservice
Namespace:    monitoring
API Version:  security.istio.io/v1
Kind:         AuthorizationPolicy
Spec:
  Action:  CUSTOM
  Provider:
    Name:  authservice
  Rules:
    When:
      Key:  request.headers[authorization]
      Not Values:
        *
  Selector:
    Match Labels:
      app.kubernetes.io/name:  prometheus

I think this default policy is going to intercept all traffic to the prometheus pods as CUSTOM rules take precedence over any ALLOW rules (https://istio.io/latest/docs/reference/config/security/authorization-policy). And the goal would be to explicitly allow traffic (and bypass authservice) from certain sources.

I am understanding this correctly? Or is there a simpler way to explicitly allow traffic with an authz policy that doesn't mess with the generated ones for authservice.

joelmccoy · 2024-11-16T21:29:05Z

@mjnagel Got this working for prometheus by adding the virtualservice that adds the header authorization: 'internal-traffic' to traffic routed through it. I tested that grafana was still able to reach prometheus. Currently only allowing traffic from monitoring / grafana namespaces for prometheus.

However, for alertmanager, I am having issues because the way prometheus talks to alertmanager.

It seems to do some sort of service discovery and pushes to alertmanager via ip instead of hostname (and seems to miss the virtual service that would add the authorization header).

This is evident from the promethues pod logs:

prometheus ts=2024-11-16T21:20:34.705Z caller=notifier.go:612 level=error component=notifier alertmanager=https://10.42.0.50:9093/api/v2/alerts count=2 msg="Error sending alert" err="bad response status 403 Forbidden"

It looks like prometheus alerting endpoints don't have the option to just specify a hostname. It looks up the alertmanager CRD and resolves an IP based on that. Not noticing anything at first glance, but hoping there is a way to have prometheus use the hostname in it's requests to alertmanager.

joelmccoy added 2 commits November 2, 2024 19:42

feat: add virtual services / authservice for monitoring package

25be2e5

fix: reorder authservice

b68a2b0

joelmccoy requested a review from a team as a code owner November 3, 2024 02:41

joelmccoy commented Nov 3, 2024

View reviewed changes

joelmccoy added 2 commits November 2, 2024 22:06

fix: remove authservice in front of prometheus

8986de8

chore: remove authservice

bb2e477

mjnagel reviewed Nov 4, 2024

View reviewed changes

joelmccoy marked this pull request as draft November 9, 2024 19:08

joelmccoy added 5 commits November 9, 2024 15:02

chore: add back in sso

10fa01b

feat: add authz policies and virtual services

eb5f67f

chore: add license

7ff5088

Merge branch 'main' into add-monitoring-virtualservices

38d7b9e

chore: remove custom

9cacc6d

chore: cleanup custom

3c2a1a1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add monitoring virtualservices for alertmanager / prometheus #977

feat: add monitoring virtualservices for alertmanager / prometheus #977

joelmccoy commented Nov 3, 2024 •

edited

Loading

joelmccoy Nov 3, 2024

joelmccoy Nov 3, 2024

mjnagel Nov 4, 2024

mjnagel Nov 4, 2024

joelmccoy Nov 16, 2024

joelmccoy commented Nov 3, 2024

mjnagel left a comment

mjnagel Nov 4, 2024

joelmccoy commented Nov 9, 2024

joelmccoy commented Nov 16, 2024 •

edited

Loading

feat: add monitoring virtualservices for alertmanager / prometheus #977

Are you sure you want to change the base?

feat: add monitoring virtualservices for alertmanager / prometheus #977

Conversation

joelmccoy commented Nov 3, 2024 • edited Loading

Description

Related Issue

Type of change

Checklist before merging

joelmccoy Nov 3, 2024

Choose a reason for hiding this comment

joelmccoy Nov 3, 2024

Choose a reason for hiding this comment

mjnagel Nov 4, 2024

Choose a reason for hiding this comment

mjnagel Nov 4, 2024

Choose a reason for hiding this comment

joelmccoy Nov 16, 2024

Choose a reason for hiding this comment

joelmccoy commented Nov 3, 2024

mjnagel left a comment

Choose a reason for hiding this comment

mjnagel Nov 4, 2024

Choose a reason for hiding this comment

joelmccoy commented Nov 9, 2024

joelmccoy commented Nov 16, 2024 • edited Loading

joelmccoy commented Nov 3, 2024 •

edited

Loading

joelmccoy commented Nov 16, 2024 •

edited

Loading