Self Service Log Ingestion #3518

Rotfuks · 2024-06-24T13:51:11Z

Motivation

We want customers to be able to ingest whatever data is relevant for them in a self service way, this also includes logs. So we need to make sure we have a way how they can add their own data sources for logs.

Todo

investigate how exactly we could empower customers to add their own log sources - for example PodLogs https://github.com/giantswarm/giantswarm/issues/29072
Implement any needed changes to make it happen
Create a documentation draft how customers can add their own sources of events/logs to be monitored
Get some feedback from AE about the documentation
Make sure the documentation is published in the new observability platform docs

Outcome

Customers can now add their own sources of events to be monitored by the observability platform
there is docs and educational content out there showing them how it's done

QuentinBisson · 2024-07-22T09:58:58Z

We see that we can use pod logs but do we want to force customers to create pod logs for log ingestion? Can we allow them to collect logs at the namespace level (with annotations and so on)?

Rotfuks · 2024-07-22T14:21:42Z

How much effort is it to create podlogs for customers? I would love to have some label based stuff where we can just say "add this label and it's automatically ingested" because that makes it quite flexible and intuitive. It will also help us with mutli-tenancy I believe.

QuentinBisson · 2024-07-22T19:13:00Z

The issue I have is not that pod logs don't make sense but I would think they should be used on really rare occasions. Ideally, an annotation/label on the pod or namespace should be enough to get most the tenant for most log and that would make profiles and traces collection easier. I would only use pod logs if the pod needs a custom pipeline imo

What i'm not sure is if we can get alls logs for a namespace if it's annotated unless the pod has it's own label and unless it's equipped with a pod log?

I would think we could do something with drops but I'm not sure. Maybe @TheoBrigitte knows if log sources can exclude data taken from other sources?

TheoBrigitte · 2024-08-26T17:31:42Z

When using Alloy as logging agent installed within a workload cluster, we configure it in a way which would allow to retrieves logs from specific namespaces and/or pods.

This solution makes use of 2 differente PodLogs (with mutual exclusion):

One PodLog to select all pods from namespaces with a specific label podlogs_ns.yaml.txt
One PodLog to select all pods with a specific label podlogs_pod.yaml.txt

Those PodLogs would be configured by us and customers would only deal with labels on their resources.

With this solution we might face a problem with resources usage on the Kubeletes, as all the log traffic would happen via the Kubernetes API the network and CPU usage on Kubelet might be problematic especially in cases where many/all pods would be monitored.
Alloy does not currently provide another way to select targets based on their namespace. The usual loki.source.file does not suffer from Kubelet resources usage problem as logs are directly retrieved from the node where Alloy is running, but it does not allow to select pods by namespace.

I opened an upstream issue requesting to add the namespace metadata within the discovery.kubernetes component, this would allow us to avoid using PodLogs and suffering from their overhead.

QuentinBisson · 2024-08-26T18:45:57Z

Did you take a look at this?https://grafana.com/docs/alloy/latest/reference/components/loki/loki.source.kubernetes/

TheoBrigitte · 2024-08-27T12:58:16Z

Did you take a look at this?https://grafana.com/docs/alloy/latest/reference/components/loki/loki.source.kubernetes/

Looking it, this would be simpler than the currently used local.file_match in our solution, but I also do not see the benefit over loki.source.podlogs, you get rid of the need for PodLogs resources but also loose the capability to filter on namespaces labels and you still have the network and CPU overhead on the Kubernetes API server.

QuentinBisson · 2024-08-27T13:16:16Z

I quite like that we do not have to run it as a daemonset though :D

But why do you not have the namespace ? I thought those should give you __meta_kubernetes_namespace in the loki.process or relabel phase?

QuentinBisson · 2024-08-27T13:18:02Z

Oh you meant namespace labels,nevermind

TheoBrigitte · 2024-08-27T16:27:20Z

Using a combination of loki.relabel and loki.source.podlogs components it is possible to set the tenant id based on a given label from the pod or its namespace.

In the following example the tenant id is taken from the pod label foo.

Here is the config and the PodLog resource I used

Alloy config

loki.source.podlogs "default" {
  forward_to = [loki.relabel.default.receiver]
}

loki.relabel "default" {
  forward_to = [loki.write.default.receiver]

  rule {
    action = "replace"
    source_labels = ["foo"]
    target_label  = "__tenant_id__"
    replacement = "$1"
    regex = "(.*)"
  }

  rule {
    action = "labeldrop"
    regex = "^foo$"
  }
}

loki.write "default" {
  endpoint {
    url = "https://loki.svc/loki/api/v1/push"
  }
}

PodLog (note: this will select all pods from all namespaces, change the selectors to fit your need)

apiVersion: monitoring.grafana.com/v1alpha2
kind: PodLogs
metadata:
  name: pod-tenant-id-from-label
spec:
  selector: {}
  namespaceSelector: {}
  relabelings:
  - action: replace
    sourceLabels: ["__meta_kubernetes_pod_label_foo"]
    targetLabel: "foo"
    replacement: "$1"
    regex: "(.*)"

It is also possible to set the tenant id using the loki.process component which has a tenant stage which allow for exactly this; setting the tenant id, but from there only log entry content are accessible.
More info at https://grafana.com/docs/alloy/latest/reference/components/loki/loki.process/#stagetenant-block

TheoBrigitte · 2024-09-19T14:52:16Z

Current prototype idea

Improvements we want to explore:

Workaround the Kubelet traffic limitation by fetching logs from local disk, using either loki.source.kubernetes or some newer features like join or logs.alloy module
Avoid duplicated targets
How to provide access to Alloy components like loki.process

TheoBrigitte · 2024-09-20T16:43:56Z

Using loki.source.kubernetes would only allow to select pods and not namespace using labels, as the pod's namespace labels are not exposed in this component, that's why I opened a upstream issue asking to expose those grafana/alloy#1550

TheoBrigitte · 2024-09-23T11:48:14Z

The potential new join feature would not help in our case as this would only allow enriching metadata in theloki.relabel component but there would still be no way to pass the resulting targets into the loki.source.file.

QuentinBisson · 2024-09-23T11:52:56Z

What if you enrich then drop logs instead of trying to discover only those we should "scrape" ?

TheoBrigitte · 2024-09-23T11:54:11Z

What if you enrich then drop logs ?

There would still be now way match the resulting targets against a local file as the loki.relabel and loki.source.file components cannot be connected

TheoBrigitte · 2024-09-23T11:55:04Z

logs.alloy also does not help as its mainly a wrapper around existing alloy components.

QuentinBisson · 2024-09-23T11:57:45Z

What if you enrich then drop logs ?

There would still be now way match the resulting targets against a local file as the loki.relabel and loki.source.file components cannot be connected

if you join based on the extracted labels from loki.source.file and the one that does kubernetes discovery that's not possible? 🤔 It might be interesting to go to the next community meeting

TheoBrigitte · 2024-09-23T12:05:13Z

What if you enrich then drop logs ?

There would still be now way match the resulting targets against a local file as the loki.relabel and loki.source.file components cannot be connected

if you join based on the extracted labels from loki.source.file and the one that does kubernetes discovery that's not possible? 🤔 It might be interesting to go to the next community meeting

The namespace metadata is only present when using the loki.source.podlogs component, and this component cannot be chained with loki.source.file. The discovery.kubernetes component does not expose namespace metadata and the join proposition made upstream would only happen in loki.relabel stage and also cannot be linked into the loki.source.file.

loki.source.file is only compatible with component exporting targets: https://grafana.com/docs/alloy/latest/reference/compatibility/#targets-exporters, which in our case means discovery.kubernetes or local.file_match, therefore we cannot access namespace metadata unless exposed by discovery.kuberntes directly.

TheoBrigitte · 2024-09-23T13:47:18Z

We can't load components like loki.process dynamically into Alloy.

The way to load dynamic configuration into Alloy is via modules. A module is describe by a declare block which only accept argument and export blocks, meaning there would be no way to pass any of the stage block from loki.process. There is a module import example using loki.process here https://grafana.com/docs/alloy/latest/get-started/modules/#example.

TheoBrigitte · 2024-09-30T19:56:00Z

It is currently not possible to use kyverno policy to label the kube-system namespace, as kyverno lacks permissions to do so

$ cat kube-system-logging.cpol.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: kube-system-logging
spec:
  admission: true
  background: true
  mutateExistingOnPolicyUpdate: true
  rules:
    - name: kube-system-enable-logging
      match:
        resources:
          kinds:
            - Namespace
          name: kube-system
      mutate:
        patchesJson6902: '[{"op":"add","path":"/metadata/labels/giantswarm.io~1logging","value":"enabled"}]'
        targets:
        - kind: Namespace
          name: kube-system

$ k apply -f kube-system-logging.cpol.yaml
Error from server: error when creating "kube-system-logging.cpol.yaml": admission webhook "validate-policy.kyverno.svc" denied the request: path: spec.rules[0].mutate.targets.: auth check fails, additional privileges are required for the service account 'system:serviceaccount:kyverno:kyverno-background-controller': failed to get GVR for kind /Namespace; failed to get GVR for kind /Namespace

TheoBrigitte · 2024-10-01T11:06:20Z

Just linking the Alloy internal tenant_id label used for tenant override https://github.com/grafana/alloy/blob/8f1be0e86b0ced53e73cb30d228aa736b1380d89/internal/component/common/loki/client/client.go#L35

TheoBrigitte · 2024-10-01T17:54:49Z

Load testing the log pipeline with Alloy as logs using podlogs and loki-canary

loki values to run canary only on a wc

global:
  clusterDomain: cluster.local
  dnsService: coredns
  image:
    registry: gsoci.azurecr.io
  podSecurityStandards:
    enforced: true
loki:
  enabled: false
lokiCanary:
  enabled: true
  push: false
  mode: deployment
  deployment:
    replicaCount: 30
    strategy:
      type: RollingUpdate
  extraArgs:
  - -tenant-id=playground
  - -interval=5ms
  - -pruneinterval=60s
  - -size=384
multiTenantAuth:
  enabled: false

This allow to reach ~10k log lines for a single tenant

The goal is to reproduce what we currently have on alba/peu01

Still need to compare the promtail vs alloy setup

Rotfuks mentioned this issue Jun 24, 2024

Self-Service Data Ingestion #3515

Open

Rotfuks added the team/atlas Team Atlas label Jun 24, 2024

TheoBrigitte assigned TheoBrigitte and unassigned TheoBrigitte Aug 6, 2024

TheoBrigitte self-assigned this Aug 27, 2024

This was referenced Oct 1, 2024

add PodLogs template giantswarm/alloy-app#55

Draft

upgrade alloy-logs to allow passing PodLogs via helm chart values giantswarm/observability-bundle#248

Draft

TheoBrigitte mentioned this issue Oct 3, 2024

Configure Alloy to be use as a self-service tool for logs giantswarm/logging-operator#233

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self Service Log Ingestion #3518

Self Service Log Ingestion #3518

Rotfuks commented Jun 24, 2024 •

edited by QuentinBisson

Loading

QuentinBisson commented Jul 22, 2024

Rotfuks commented Jul 22, 2024

QuentinBisson commented Jul 22, 2024

TheoBrigitte commented Aug 26, 2024 •

edited

Loading

QuentinBisson commented Aug 26, 2024

TheoBrigitte commented Aug 27, 2024

QuentinBisson commented Aug 27, 2024

QuentinBisson commented Aug 27, 2024

TheoBrigitte commented Aug 27, 2024

TheoBrigitte commented Sep 19, 2024

TheoBrigitte commented Sep 20, 2024

TheoBrigitte commented Sep 23, 2024

QuentinBisson commented Sep 23, 2024 •

edited

Loading

TheoBrigitte commented Sep 23, 2024

TheoBrigitte commented Sep 23, 2024

QuentinBisson commented Sep 23, 2024

TheoBrigitte commented Sep 23, 2024

TheoBrigitte commented Sep 23, 2024

TheoBrigitte commented Sep 30, 2024

TheoBrigitte commented Oct 1, 2024

TheoBrigitte commented Oct 1, 2024

Self Service Log Ingestion #3518

Self Service Log Ingestion #3518

Comments

Rotfuks commented Jun 24, 2024 • edited by QuentinBisson Loading

Motivation

Todo

Outcome

QuentinBisson commented Jul 22, 2024

Rotfuks commented Jul 22, 2024

QuentinBisson commented Jul 22, 2024

TheoBrigitte commented Aug 26, 2024 • edited Loading

QuentinBisson commented Aug 26, 2024

TheoBrigitte commented Aug 27, 2024

QuentinBisson commented Aug 27, 2024

QuentinBisson commented Aug 27, 2024

TheoBrigitte commented Aug 27, 2024

TheoBrigitte commented Sep 19, 2024

TheoBrigitte commented Sep 20, 2024

TheoBrigitte commented Sep 23, 2024

QuentinBisson commented Sep 23, 2024 • edited Loading

TheoBrigitte commented Sep 23, 2024

TheoBrigitte commented Sep 23, 2024

QuentinBisson commented Sep 23, 2024

TheoBrigitte commented Sep 23, 2024

TheoBrigitte commented Sep 23, 2024

TheoBrigitte commented Sep 30, 2024

TheoBrigitte commented Oct 1, 2024

TheoBrigitte commented Oct 1, 2024

Rotfuks commented Jun 24, 2024 •

edited by QuentinBisson

Loading

TheoBrigitte commented Aug 26, 2024 •

edited

Loading

QuentinBisson commented Sep 23, 2024 •

edited

Loading