ServiceMonitor: Invalid Configuration #343

rbaumgar · 2024-08-19T15:03:06Z

On my OpenShift 4.16 with OpenShift Lightspeed Operator 0.1.2

ServiceMonitor lightspeed-app-server-monitor was rejected due to invalid configuration: it accesses file system via bearer token file which Prometheus specification prohibits
ServiceMonitor lightspeed-operator-controller-manager-metrics-monitor was rejected due to invalid configuration: it accesses file system via tls config which Prometheus specification prohibits

$ oc get event -n openshift-lightspeed 
LAST SEEN   TYPE      REASON                 OBJECT                                                                  MESSAGE
29m         Warning   InvalidConfiguration   servicemonitor/lightspeed-app-server-monitor                            ServiceMonitor lightspeed-app-server-monitor was rejected due to invalid configuration: it accesses file system via bearer token file which Prometheus specification prohibits
29m         Warning   InvalidConfiguration   servicemonitor/lightspeed-operator-controller-manager-metrics-monitor   ServiceMonitor lightspeed-operator-controller-manager-metrics-monitor was rejected due to invalid configuration: it accesses file system via tls config which Prometheus specification prohibits

$ oc get servicemonitors.monitoring.coreos.com -n openshift-lightspeed -o yaml|oc neat
- apiVersion: monitoring.coreos.com/v1
  kind: ServiceMonitor
  metadata:
    labels:
      app.kubernetes.io/component: metrics
      app.kubernetes.io/managed-by: lightspeed-operator
      app.kubernetes.io/name: lightspeed-service-api
      app.kubernetes.io/part-of: openshift-lightspeed
      monitoring.openshift.io/collection-profile: full
    name: lightspeed-app-server-monitor
    namespace: openshift-lightspeed
  spec:
    endpoints:
    - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
      interval: 30s
      path: /metrics
      port: https
      scheme: https
      tlsConfig:
        caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
        certFile: /etc/prometheus/secrets/metrics-client-certs/tls.crt
        keyFile: /etc/prometheus/secrets/metrics-client-certs/tls.key
        serverName: lightspeed-app-server.openshift-lightspeed.svc
    jobLabel: app.kubernetes.io/name
    selector:
      matchLabels:
        app.kubernetes.io/component: application-server
        app.kubernetes.io/managed-by: lightspeed-operator
        app.kubernetes.io/name: lightspeed-service-api
        app.kubernetes.io/part-of: openshift-lightspeed
- apiVersion: monitoring.coreos.com/v1
  kind: ServiceMonitor
  metadata:
    labels:
      app.kubernetes.io/component: metrics
      app.kubernetes.io/created-by: lightspeed-operator
      app.kubernetes.io/instance: controller-manager-metrics-monitor
      app.kubernetes.io/managed-by: kustomize
      app.kubernetes.io/name: servicemonitor
      app.kubernetes.io/part-of: lightspeed-operator
      control-plane: controller-manager
      olm.managed: "true"
    name: lightspeed-operator-controller-manager-metrics-monitor
    namespace: openshift-lightspeed
  spec:
    endpoints:
    - path: /metrics
      port: metrics
      scheme: https
      tlsConfig:
        caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
        certFile: /etc/prometheus/secrets/metrics-client-certs/tls.crt
        insecureSkipVerify: false
        keyFile: /etc/prometheus/secrets/metrics-client-certs/tls.key
        serverName: lightspeed-operator-controller-manager-service.openshift-lightspeed.svc
    selector:
      matchLabels:
        control-plane: controller-manager

The text was updated successfully, but these errors were encountered:

raptorsun · 2024-08-29T09:48:12Z

Thank you for raising the issue.
Could you please share the OLSConfig CR that produces this problem?
I cannot reproduce the problem with Openshift 4.16.

Meanwhile, please try upgrade to version 0.1.3 to see whether this issue persists.

xiormeesh · 2024-08-29T10:51:40Z

I found this thread because I was fixing the same issue in another project on 4.16, it's caused by .spec.endpoints[] bearerTokenFile being deprecated in 4.16, however it should give a warning for now and not block installation, maybe there is something forcing fails on deprecations on OP's cluster.

rbaumgar · 2024-09-02T09:41:14Z

In the meantime, the cluster upgraded to version 0.1.3.
Alert didn't go away.
And yes, it is not blocking, it is found by an alert, PrometheusOperatorRejectedResources

openshift-bot · 2024-12-02T01:01:09Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2025-01-01T08:30:07Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

rbaumgar · 2025-01-02T08:13:04Z

still open, even with lightspeed 0.2.1

level=warn ts=2025-01-02T07:27:03.376846094Z caller=resource_selector.go:126 component=prometheus-controller msg="skipping servicemonitor" error="it accesses file system via bearer token file which Prometheus specification prohibits" servicemonitor=openshift-lightspeed/lightspeed-app-server-monitor namespace=openshift-user-workload-monitoring prometheus=user-workload

rbaumgar · 2025-01-02T08:14:08Z

/remove-lifecycle stale

raptorsun · 2025-01-02T11:28:48Z

Happy new year! Thank you for reminding us of this issue!
A Jira ticket is written to track this issue https://issues.redhat.com/browse/OLS-1322

raptorsun · 2025-01-13T08:30:37Z

/remove-lifecycle rotten

raptorsun · 2025-01-13T10:06:06Z

Hello @rbaumgar , in the cluster having this issue

does the namespace openshfit-lightspeed have the label openshift.io/cluster-monitoring: "true"?
does the cluster have user workload monitoring stack setup?

Here is a related runbook on the alert PrometheusOperatorRejectedResources : https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/PrometheusOperatorRejectedResources.md#servicemonitor-and-podmonitor

raptorsun · 2025-01-13T14:39:55Z

Could you please share the content of the configmap cluster-monitoring-config in the namespace openshift-monitoring ?

rbaumgar · 2025-01-13T14:49:21Z

@raptorsun

when the label is set the error is gone.
yes

cluster-monitoring-config

config.yaml:
----
enableUserWorkload: true
# Parameters for Platform prometheus
prometheusK8s:
  # retention: 15d
  # retentionSize: 90GB
  # volumeClaimTemplate:
  #   spec:
  #     resources:
  #       requests:
  #         storage: 100Gi

raptorsun · 2025-01-14T08:42:52Z

Thank you for the timely return.

Adding the label openshift.io/cluster-monitoring: "true" will make the in-cluster Prometheus (Platform prometheus) scrape metrics from the namespace openshift-lightspeed. In that namespace there is no such restriction to access token file via filesystem.

As user-workload-monitoring is activated, the alert should come from the UMW Prometheus that uses different settings from the in-cluster one.

To gather more details from this issue, could you check whether the ServiceMonitors in the namespace openshift-lightspeed have the label openshift.io/user-monitoring: "false"?

rbaumgar · 2025-01-14T13:40:00Z

controller yes

$ oc get servicemonitors.monitoring.coreos.com -n openshift-lightspeed --show-labels 
NAME                                                     AGE    LABELS
lightspeed-app-server-monitor                            116d   app.kubernetes.io/component=metrics,app.kubernetes.io/managed-by=lightspeed-operator,app.kubernetes.io/name=lightspeed-service-api,app.kubernetes.io/part-of=openshift-lightspeed,monitoring.openshift.io/collection-profile=full
lightspeed-operator-controller-manager-metrics-monitor   116d   app.kubernetes.io/component=metrics,app.kubernetes.io/created-by=lightspeed-operator,app.kubernetes.io/instance=controller-manager-metrics-monitor,app.kubernetes.io/managed-by=kustomize,app.kubernetes.io/name=servicemonitor,app.kubernetes.io/part-of=lightspeed-operator,control-plane=controller-manager,olm.managed=true,openshift.io/user-monitoring=false

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 2, 2024

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 1, 2025

openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ServiceMonitor: Invalid Configuration #343

ServiceMonitor: Invalid Configuration #343

rbaumgar commented Aug 19, 2024

raptorsun commented Aug 29, 2024

xiormeesh commented Aug 29, 2024

rbaumgar commented Sep 2, 2024

openshift-bot commented Dec 2, 2024

openshift-bot commented Jan 1, 2025

rbaumgar commented Jan 2, 2025

rbaumgar commented Jan 2, 2025

raptorsun commented Jan 2, 2025

raptorsun commented Jan 13, 2025

raptorsun commented Jan 13, 2025

raptorsun commented Jan 13, 2025

rbaumgar commented Jan 13, 2025

raptorsun commented Jan 14, 2025

rbaumgar commented Jan 14, 2025

ServiceMonitor: Invalid Configuration #343

ServiceMonitor: Invalid Configuration #343

Comments

rbaumgar commented Aug 19, 2024

raptorsun commented Aug 29, 2024

xiormeesh commented Aug 29, 2024

rbaumgar commented Sep 2, 2024

openshift-bot commented Dec 2, 2024

openshift-bot commented Jan 1, 2025

rbaumgar commented Jan 2, 2025

rbaumgar commented Jan 2, 2025

raptorsun commented Jan 2, 2025

raptorsun commented Jan 13, 2025

raptorsun commented Jan 13, 2025

raptorsun commented Jan 13, 2025

rbaumgar commented Jan 13, 2025

raptorsun commented Jan 14, 2025

rbaumgar commented Jan 14, 2025