Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ServiceMonitor: Invalid Configuration #343

Open
rbaumgar opened this issue Aug 19, 2024 · 14 comments
Open

ServiceMonitor: Invalid Configuration #343

rbaumgar opened this issue Aug 19, 2024 · 14 comments

Comments

@rbaumgar
Copy link

On my OpenShift 4.16 with OpenShift Lightspeed Operator 0.1.2

ServiceMonitor lightspeed-app-server-monitor was rejected due to invalid configuration: it accesses file system via bearer token file which Prometheus specification prohibits
ServiceMonitor lightspeed-operator-controller-manager-metrics-monitor was rejected due to invalid configuration: it accesses file system via tls config which Prometheus specification prohibits

$ oc get event -n openshift-lightspeed 
LAST SEEN   TYPE      REASON                 OBJECT                                                                  MESSAGE
29m         Warning   InvalidConfiguration   servicemonitor/lightspeed-app-server-monitor                            ServiceMonitor lightspeed-app-server-monitor was rejected due to invalid configuration: it accesses file system via bearer token file which Prometheus specification prohibits
29m         Warning   InvalidConfiguration   servicemonitor/lightspeed-operator-controller-manager-metrics-monitor   ServiceMonitor lightspeed-operator-controller-manager-metrics-monitor was rejected due to invalid configuration: it accesses file system via tls config which Prometheus specification prohibits

$ oc get servicemonitors.monitoring.coreos.com -n openshift-lightspeed -o yaml|oc neat
- apiVersion: monitoring.coreos.com/v1
  kind: ServiceMonitor
  metadata:
    labels:
      app.kubernetes.io/component: metrics
      app.kubernetes.io/managed-by: lightspeed-operator
      app.kubernetes.io/name: lightspeed-service-api
      app.kubernetes.io/part-of: openshift-lightspeed
      monitoring.openshift.io/collection-profile: full
    name: lightspeed-app-server-monitor
    namespace: openshift-lightspeed
  spec:
    endpoints:
    - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
      interval: 30s
      path: /metrics
      port: https
      scheme: https
      tlsConfig:
        caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
        certFile: /etc/prometheus/secrets/metrics-client-certs/tls.crt
        keyFile: /etc/prometheus/secrets/metrics-client-certs/tls.key
        serverName: lightspeed-app-server.openshift-lightspeed.svc
    jobLabel: app.kubernetes.io/name
    selector:
      matchLabels:
        app.kubernetes.io/component: application-server
        app.kubernetes.io/managed-by: lightspeed-operator
        app.kubernetes.io/name: lightspeed-service-api
        app.kubernetes.io/part-of: openshift-lightspeed
- apiVersion: monitoring.coreos.com/v1
  kind: ServiceMonitor
  metadata:
    labels:
      app.kubernetes.io/component: metrics
      app.kubernetes.io/created-by: lightspeed-operator
      app.kubernetes.io/instance: controller-manager-metrics-monitor
      app.kubernetes.io/managed-by: kustomize
      app.kubernetes.io/name: servicemonitor
      app.kubernetes.io/part-of: lightspeed-operator
      control-plane: controller-manager
      olm.managed: "true"
    name: lightspeed-operator-controller-manager-metrics-monitor
    namespace: openshift-lightspeed
  spec:
    endpoints:
    - path: /metrics
      port: metrics
      scheme: https
      tlsConfig:
        caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
        certFile: /etc/prometheus/secrets/metrics-client-certs/tls.crt
        insecureSkipVerify: false
        keyFile: /etc/prometheus/secrets/metrics-client-certs/tls.key
        serverName: lightspeed-operator-controller-manager-service.openshift-lightspeed.svc
    selector:
      matchLabels:
        control-plane: controller-manager
@raptorsun
Copy link
Contributor

Thank you for raising the issue.
Could you please share the OLSConfig CR that produces this problem?
I cannot reproduce the problem with Openshift 4.16.

Meanwhile, please try upgrade to version 0.1.3 to see whether this issue persists.

@xiormeesh
Copy link

I found this thread because I was fixing the same issue in another project on 4.16, it's caused by .spec.endpoints[] bearerTokenFile being deprecated in 4.16, however it should give a warning for now and not block installation, maybe there is something forcing fails on deprecations on OP's cluster.

@rbaumgar
Copy link
Author

rbaumgar commented Sep 2, 2024

In the meantime, the cluster upgraded to version 0.1.3.
Alert didn't go away.
And yes, it is not blocking, it is found by an alert, PrometheusOperatorRejectedResources

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 2, 2024
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 1, 2025
@rbaumgar
Copy link
Author

rbaumgar commented Jan 2, 2025

still open, even with lightspeed 0.2.1

level=warn ts=2025-01-02T07:27:03.376846094Z caller=resource_selector.go:126 component=prometheus-controller msg="skipping servicemonitor" error="it accesses file system via bearer token file which Prometheus specification prohibits" servicemonitor=openshift-lightspeed/lightspeed-app-server-monitor namespace=openshift-user-workload-monitoring prometheus=user-workload

@rbaumgar
Copy link
Author

rbaumgar commented Jan 2, 2025

/remove-lifecycle stale

@raptorsun
Copy link
Contributor

Happy new year! Thank you for reminding us of this issue!
A Jira ticket is written to track this issue https://issues.redhat.com/browse/OLS-1322

@raptorsun
Copy link
Contributor

/remove-lifecycle rotten

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 13, 2025
@raptorsun
Copy link
Contributor

Hello @rbaumgar , in the cluster having this issue

  1. does the namespace openshfit-lightspeed have the label openshift.io/cluster-monitoring: "true"?
  2. does the cluster have user workload monitoring stack setup?

Here is a related runbook on the alert PrometheusOperatorRejectedResources : https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/PrometheusOperatorRejectedResources.md#servicemonitor-and-podmonitor

@raptorsun
Copy link
Contributor

Could you please share the content of the configmap cluster-monitoring-config in the namespace openshift-monitoring ?

@rbaumgar
Copy link
Author

@raptorsun

  1. when the label is set the error is gone.
  2. yes

cluster-monitoring-config

config.yaml:
----
enableUserWorkload: true
# Parameters for Platform prometheus
prometheusK8s:
  # retention: 15d
  # retentionSize: 90GB
  # volumeClaimTemplate:
  #   spec:
  #     resources:
  #       requests:
  #         storage: 100Gi

@raptorsun
Copy link
Contributor

Thank you for the timely return.

Adding the label openshift.io/cluster-monitoring: "true" will make the in-cluster Prometheus (Platform prometheus) scrape metrics from the namespace openshift-lightspeed. In that namespace there is no such restriction to access token file via filesystem.

As user-workload-monitoring is activated, the alert should come from the UMW Prometheus that uses different settings from the in-cluster one.

To gather more details from this issue, could you check whether the ServiceMonitors in the namespace openshift-lightspeed have the label openshift.io/user-monitoring: "false"?

@rbaumgar
Copy link
Author

controller yes

$ oc get servicemonitors.monitoring.coreos.com -n openshift-lightspeed --show-labels 
NAME                                                     AGE    LABELS
lightspeed-app-server-monitor                            116d   app.kubernetes.io/component=metrics,app.kubernetes.io/managed-by=lightspeed-operator,app.kubernetes.io/name=lightspeed-service-api,app.kubernetes.io/part-of=openshift-lightspeed,monitoring.openshift.io/collection-profile=full
lightspeed-operator-controller-manager-metrics-monitor   116d   app.kubernetes.io/component=metrics,app.kubernetes.io/created-by=lightspeed-operator,app.kubernetes.io/instance=controller-manager-metrics-monitor,app.kubernetes.io/managed-by=kustomize,app.kubernetes.io/name=servicemonitor,app.kubernetes.io/part-of=lightspeed-operator,control-plane=controller-manager,olm.managed=true,openshift.io/user-monitoring=false

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants