Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring deployment #481

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
48 changes: 48 additions & 0 deletions deployments/ocis-monitoring/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# oCIS with monitoring deployment example

## Introduction

This example shows how to deploy oCIS with monitoring, in this case Grafana.

***Note***: This example is not intended for production use. It is intended to get a working oCIS
with Grafana running in Kubernetes as quickly as possible. It is not hardened in any way. This also applies to password and http usage, so the password for Grafana at the beginning of the `helmfile` should be changed. This example uses unencypted HTTP between the Grafana-Agent and mimir/loki/tempo.

## Getting started

### Prerequisites

This example requires the following things to be installed:

- [Kubernetes](https://kubernetes.io/) cluster, with an ingress controller installed.
- [Helm](https://helm.sh/) v3
- [Helmfile](https://github.com/helmfile/helmfile)

### End result

After following the steps in this guide, you should be able to access the following endpoint, you
may want to add these to your `/etc/hosts` file pointing to your ingress controller IP:

- https://ocis.kube.owncloud.test
- https://grafana.kube.owncloud.test

Note that if you want to use your own hostname and domain, you will have to change the `externalDomain` value in oCIS and the Domains for Loki, Mimir and Tempo at the beginning of the `helmfile`. Please also ensure that the hostnames for Loki, Mimir and Tempo can be resolved within the cluster.

### Deploying

In this directory, run the following commands:

```bash
$ helmfile sync
```

This will deploy oCIS and Grafana.

### Logging in

You can get the admin password with the following command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably should add that for grafana, too:

kubectl -n grafana get secrets/grafana --template='{{ index .data "admin-password" | base64decode | printf "%s\n" }}'


```bash
$ kubectl -n ocis get secrets/admin-user --template='{{.data.password | base64decode | printf "%s\n" }}'
```

You can use this password to login with the user `admin`.
4 changes: 4 additions & 0 deletions deployments/ocis-monitoring/grafana-agent/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
apiVersion: v2
name: grafana-agent
type: application
version: 0.0.0
171 changes: 171 additions & 0 deletions deployments/ocis-monitoring/grafana-agent/templates/grafana-agent.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
---

apiVersion: v1
kind: ServiceAccount
metadata:
name: grafana-agent

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: grafana-agent
rules:
- apiGroups:
- ""
resources:
- nodes
- nodes/proxy
- nodes/metrics
- services
- endpoints
- pods
- events
verbs:
- get
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs:
- get
- list
- watch
- nonResourceURLs:
- /metrics
- /metrics/cadvisor
verbs:
- get

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: grafana-agent
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: grafana-agent
subjects:
- kind: ServiceAccount
name: grafana-agent
namespace: grafana-agent

---
apiVersion: monitoring.grafana.com/v1alpha1
kind: GrafanaAgent
metadata:
name: grafana-agent-metrics
labels:
app: grafana-agent-metrics
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
Comment on lines +66 to +69
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't fail but is not defined in the values.yaml

# resources:
# requests:
# cpu: 500m
# memory: 11Gi
image: "{{ .Values.image.repository }}/{{ .Values.image.name }}:{{ .Values.image.tag }}"
logLevel: info
serviceAccountName: grafana-agent
metrics:
instanceSelector:
matchLabels:
agent: grafana-agent-metrics
externalLabels:
cluster: {{ .Values.clusterName }}

integrations:
selector:
matchLabels:
agent: grafana-agent-integrations

---
apiVersion: monitoring.grafana.com/v1alpha1
kind: GrafanaAgent
metadata:
name: grafana-agent-logs
labels:
app: grafana-agent-logs
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
image: "{{ .Values.image.repository }}/{{ .Values.image.name }}:{{ .Values.image.tag }}"
logLevel: info
resources:
requests:
cpu: 100m
memory: 300Mi
serviceAccountName: grafana-agent

logs:
instanceSelector:
matchLabels:
agent: grafana-agent-logs
clients:
# default value, is overwritten in LogsInstance
- url: http://{{ .Values.remoteLokiHost }}/loki/api/v1/push
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying it in minkube and having trouble on the grafana-agent-logs-logs pod:

grafana-agent-logs-logs-wjhr4 grafana-agent ts=2024-02-16T15:04:40.879266992Z caller=client.go:430 level=error component=logs logs_config=grafana-agent/primary component=client host=loki.kube.owncloud.test msg="final error sending batch" status=308 tenant= error="server returned HTTP status 308 Permanent Redirect (308): <html>"

Probably this might be a HTTP -> HTTPS redirect?

Curl at least supports that:

curl http://loki.kube.owncloud.test -vv 
* Host loki.kube.owncloud.test:80 was resolved.
* IPv6: (none)
* IPv4: 192.168.49.2
*   Trying 192.168.49.2:80...
* Connected to loki.kube.owncloud.test (192.168.49.2) port 80
> GET / HTTP/1.1
> Host: loki.kube.owncloud.test
> User-Agent: curl/8.5.0
> Accept: */*
> 
< HTTP/1.1 308 Permanent Redirect
< Date: Fri, 16 Feb 2024 15:17:15 GMT
< Content-Type: text/html
< Content-Length: 164
< Connection: keep-alive
< Location: https://loki.kube.owncloud.test
< 
<html>
<head><title>308 Permanent Redirect</title></head>
<body>
<center><h1>308 Permanent Redirect</h1></center>
<hr><center>nginx</center>
</body>
</html>
* Connection #0 to host loki.kube.owncloud.test left intact
*

Turns out I need to explicitly turn off the HTTP -> HTTPS redirect by adding ssl-redirect: "false" on kubectl -n ingress-nginx edit cm/ingress-nginx-controller

externalLabels:
cluster: {{ .Values.clusterName }}
---
apiVersion: v1
kind: Secret
metadata:
name: extra-jobs
stringData:
jobs.yaml: |
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
job_name: integrations/kubernetes/kubelet
kubernetes_sd_configs:
- role: node
relabel_configs:
- replacement: kubernetes.default.svc:443
target_label: __address__
- regex: (.+)
source_labels: [__meta_kubernetes_node_name]
replacement: /api/v1/nodes/$1/proxy/metrics
target_label: __metrics_path__
- action: hashmod
modulus: $(SHARDS)
source_labels:
- __address__
target_label: __tmp_hash
- action: keep
regex: $(SHARD)
source_labels:
- __tmp_hash
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
job_name: integrations/kubernetes/cadvisor
kubernetes_sd_configs:
- role: node
relabel_configs:
- replacement: kubernetes.default.svc:443
target_label: __address__
- regex: (.+)
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
source_labels:
- __meta_kubernetes_node_name
target_label: __metrics_path__
- action: hashmod
modulus: $(SHARDS)
source_labels:
- __address__
target_label: __tmp_hash
- action: keep
regex: $(SHARD)
source_labels:
- __tmp_hash
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
apiVersion: monitoring.grafana.com/v1alpha1
kind: LogsInstance
metadata:
name: primary
labels:
agent: grafana-agent-logs
spec:
clients:
- url: http://{{ .Values.remoteLokiHost }}/loki/api/v1/push
basicAuth:
username:
name: primary-credentials-logs
key: username
password:
name: primary-credentials-logs
key: password
headers:
## TODO template to have the same value in grafana datasources.yaml
X-Scope-OrgID: monitoring-dev
externalLabels:
cluster: {{ .Values.clusterName }}

# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the LogsInstance CR
podLogsNamespaceSelector: {}
podLogsSelector:
matchLabels:
instance: primary
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
apiVersion: v1
kind: Secret
metadata:
name: primary-credentials-logs
stringData:
username: {{ .Values.loki.username }}
password: {{ .Values.loki.password }}
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
apiVersion: monitoring.grafana.com/v1alpha1
kind: MetricsInstance
metadata:
name: grafana-agent-metrics
namespace: grafana-agent
labels:
agent: grafana-agent-metrics
spec:
remoteWrite:
- url: http://{{ .Values.remoteMimirHost }}/api/v1/push
basicAuth:
username:
name: primary-credentials-metrics
key: username
password:
name: primary-credentials-metrics
key: password
headers:
## TODO template to have the same value in grafana datasources.yaml
X-Scope-OrgID: monitoring-dev

# As an alternative authentication method, Grafana Agent also supports OAuth2.
# - url: your_remote_write_URL
# oauth2:
# clientId:
# secret:
# key: username # Kubernetes Secret Key
# name: primary-credentials-metrics # Kubernetes Secret Name
# clientSecret:
# key: password # Kubernetes Secret Key
# name: primary-credentials-metrics # Kubernetes Secret Name
# tokenUrl: https://auth.example.com/realms/master/protocol/openid-connect/token


# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the MetricsInstance CR
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
# matchLabels:
# instance: primary

# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the MetricsInstance CR.
podMonitorNamespaceSelector: {}
podMonitorSelector: {}
# matchLabels:
# instance: primary

# Supply an empty namespace selector to look in all namespaces. Remove
# this to only look in the same namespace as the MetricsInstance CR.
probeNamespaceSelector: {}
probeSelector:
matchLabels:
instance: primary
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
apiVersion: v1
kind: Secret
metadata:
name: primary-credentials-metrics
stringData:
username: {{ .Values.mimir.username }}
password: {{ .Values.mimir.password }}
16 changes: 16 additions & 0 deletions deployments/ocis-monitoring/grafana-agent/templates/pod-logs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: monitoring.grafana.com/v1alpha1
kind: PodLogs
metadata:
labels:
instance: primary
name: kubernetes-pods
spec:
pipelineStages:
- cri: {}
- replace:
expression: (([[:alnum:]]+).:)
replace: "{{`{{ .Value | ToLower }}`}}"
namespaceSelector:
any: true
selector:
matchLabels: {}
Loading