CPE Monitoring System

External Component Setup and Installation

New cpe-monitoring-system namespace

# For openshift
oc new-project cpe-monitoring-system
# For general kubernetes
kubectl create ns cpe-monitoring-system
kubectl config set-context $(kubectl config current-context) --namespace cpe-monitoring-system

Install Prometheus Operator and Grafana Operator

For openshift, you can install both operators from Operator Hub
For general kubernetes, you can install from stable helm chart

 # add stable repo and update (if not added yet)
 helm repo add stable https://charts.helm.sh/stable
 helm repo update

 helm install -n cpe-monitoring-system prometheus stable/prometheus-operator

Deploy Service Monitor (and certificate if need)

For openshift, we may copy service monitor from original openshift-monitoring namespace

kubectl get cm -n openshift-monitoring kubelet-serving-ca-bundle -o yaml --export|kubectl create -f -
kubectl get cm -n openshift-monitoring serving-certs-ca-bundle -o yaml --export|kubectl create -f -

# Deploy Service Monitor
kubectl get servicemonitor -n openshift-monitoring kubelet -o yaml --export|kubectl create -f -

For helm chart installation, node exporter as well as kubelet service monitor are already placed
Relabel ServiceMonitor: check here

[for Thanos Sidecar] Deploy COS Storage config for thanos forwarding: see more

# S3-based example
export BUCKET_NAME=[your bucket to store log/metrics]
export ENDPOINT=[endpoint]
export REGION=[region]
export ACCESS_KEY=[access key]
export SECRET_KEY=[secret key]
envsubst < thanos/thanos-storage-config.yaml | kubectl apply -f -

[for Thanos Sidecar] Deploy Prometheus with Thanos Sidecar saving to COS

For openshift, we need to create new prometheus resource

export CLUSTER_ID=`kubectl config current-context | awk -F"/" '{ print $1 }'`
envsubst < prometheus/prometheus.yaml | kubectl create -f 
kubectl create -f prometheus/cluster-monitoring-view-adv.yaml
oc adm policy add-cluster-role-to-user cluster-monitoring-view-adv -z prometheus-k8s
kubectl create -f prometheus/cluster-role-prometheus-roks.yaml
oc adm policy add-cluster-role-to-user prometheus-roks -z prometheus-k8s

For helm installation, default prometheus is installed

edit to add thanos sidecar

export CLUSTER_ID=`kubectl config current-context | awk -F"/" '{ print $1 }'`
kubectl edit prometheus prometheus-prometheus-oper-prometheus
> replace with prometheus/prometheus.yaml .spec 
> replace ${CLUSTER_ID}
# Add monitor role
kubectl create -f prometheus/cluster-monitoring-view-adv.yaml
kubectl create -f prometheus/cr-binding.yaml

[For Thanos Query] Deploy Thanos Store Gateway
```
kubectl create -f store/
```

[For Thanos Query] Deploy Thanos Query

kubectl create -f thanos/thanos-query.yaml

[For Grafana] Deploy Grafana

For Openshift, we need to create new grafana resoruce

export GRAFANA_NAMESPACE=cpe-monitoring-system
export ADMIN_PWD=[your grafana admin password]
export GRAFANA_SERVICE_ACCOUNT=grafana-serviceaccount
# Apply grafana component after installing operators
oc create -f grafana/grafana_added.yaml -n $GRAFANA_NAMESPACE
envsubst < grafana/grafana.yaml | oc -n $GRAFANA_NAMESPACE apply -f -
# Wait until pod running
# Add monitoring role
oc adm policy add-cluster-role-to-user cluster-monitoring-view-adv -z $GRAFANA_SERVICE_ACCOUNT
# Set BEARER TOKEN
export BEARER_TOKEN=$(oc sa get-token $GRAFANA_SERVICE_ACCOUNT -n $GRAFANA_NAMESPACE) 
# Deploy thanos-query datasource
envsubst < grafana/grafana-datasource.yaml | oc -n $GRAFANA_NAMESPACE apply -f -
# Restart
kubectl delete replicaset -l app=grafana

For helm installation, default grafana is already installed, need to apply only new data source

# Add monitoring role
kubectl create -f prometheus/cluster-monitoring-view-adv.yaml
export GRAFANA_NAMESPACE=cpe-monitoring-system
export GRAFANA_SERVICE_ACCOUNT=prometheus-grafana
envsubst < grafana/cr-binding.yaml | kubectl create -f -
# Set BEARER TOKEN
TOKENNAME=$(kubectl -n $GRAFANA_NAMESPACE get serviceaccount/$GRAFANA_SERVICE_ACCOUNT -o jsonpath='{.secrets[0].name}')
export BEARER_TOKEN=`kubectl -n $GRAFANA_NAMESPACE get secret $TOKENNAME -o jsonpath='{.data.token}'| base64 --decode`
# Deploy thanos-query datasource
envsubst < grafana-datasource.yaml | kubectl -n $GRAFANA_NAMESPACE apply -f -
# Restart
kubectl delete replicaset -l app=grafana

Metric Exporters

To enhance CPE visualization with metrics from exporters, we utilize ServiceMonitor resource of prometheus operator to perform relabeling process and extract benchmark, and job label from the pod name by following relabeling item in ServiceMonitor

    metricRelabelings:
    - regex: (.*)(\-cpeh)(.*)
      replacement: '${1}'
      sourceLabels:
        - pod
      targetLabel: benchmark
    - regex: '(.*)(\-cpeh)(\-[0-9a-z]+)(\-[0-9a-z]+)'
      replacement: '${1}${2}${3}'
      sourceLabels:
        - pod
      targetLabel: job

example: kubelet.yaml This is also applicable to application-specific expoter in both operator level and benchmark level

Low-level metric exporter

[TO-DO]

collect metric on-demand by pod name (optional)
export pod label
create ServiceMonitor with metricRelabelings section

Multi-Cluster Integration

Thanos Query

add store to thanos querier arguments: thanos-query.yaml

          args:
            - query
            [...]
            - --store=dnssrv+_grpc._tcp.[sidecar ingress hostname]
            - --store=dnssrv+_grpc._tcp.[storegateway ingress hostname]

Setup secured federation scrape with Prometheus Operator

secure prometheus app of target cluster by connecting ingress with App ID by instruction here
in case of deploying ingress in new namespace (not kube-system or default), check this issue
copy ingress secret from target cluster to core cluster

create file federate_job.yaml

- job_name: 'federate'
  scrape_interval: 5m
  scrape_timeout: 1m
  honor_labels: true
  metrics_path: '/federate'
  params:
      'match[]':
      - '{job=~".*"}'
  scheme: https
  static_configs:
  - targets:
      - <prometheus server host name without https>
      labels:
        origin: <cluster name>
  oauth2:
      client_id: <service credential Client ID>
      client_secret: <service credential secret>
      token_url: https://<region>.appid.cloud.ibm.com/oauth/v4/<tenant ID>/token
      endpoint_params:
      grant_type: client_credentials
      username: <app Client ID>
      password: <app secret>
  tls_config:
      cert_file: /etc/prometheus/secrets/<ingress secret name>/tls.crt
      key_file: /etc/prometheus/secrets/<ingress secret name>/tls.key
      insecure_skip_verify: true

generate federate secret

kubectl create secret generic federate-scrape-config -n cpe-monitoring-system --from-file=federate_job.yaml

add the following specification to Prometheus resource

spec:
    ...
    # mount ingress secret to prometheus container
    containers:
    - name: prometheus
      ...
      volumeMounts:
      - mountPath: /etc/tls/trl-tok-iks
          name: secret-<ingress secret name>
          readOnly: true
    # add secret
    secrets:
    - <ingress secret name>
    # add federate job
    additionalScrapeConfigs:
      key: federate_job.yaml
      name: federate-scrape-config

reference: https://prometheus.io/docs/prometheus/latest/federation ** The prometheus pod must be restarted to apply new configuration. **

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CPE Monitoring System

External Component Setup and Installation

Metric Exporters

Low-level metric exporter

Multi-Cluster Integration

Thanos Query

Setup secured federation scrape with Prometheus Operator

Files

README.md

Latest commit

History

README.md

File metadata and controls

CPE Monitoring System

External Component Setup and Installation

Metric Exporters

Low-level metric exporter

Multi-Cluster Integration

Thanos Query

Setup secured federation scrape with Prometheus Operator