Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add k8s events logging to alloy #263

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open

Conversation

QuantumEnigmaa
Copy link
Contributor

What this PR does / why we need it

Towards giantswarm/roadmap#3750

Checklist

  • Update changelog in CHANGELOG.md.
  • Follow deployment test procedure in the tests/manual_e2e directory and have a working branch.

@QuantumEnigmaa QuantumEnigmaa self-assigned this Nov 12, 2024
@QuantumEnigmaa
Copy link
Contributor Author

I still need to find a way to enable the operator to switch between alloy or grafana-agent for k8s events logging based on which logging agent is deployed.

@QuantumEnigmaa
Copy link
Contributor Author

Still need to handle the usage of a new events-logger-secret secret that will be used by either of alloy or grafana-agent to authenticate against Loki

@QuantumEnigmaa QuantumEnigmaa marked this pull request as ready for review November 14, 2024 13:57
@QuantumEnigmaa QuantumEnigmaa requested a review from a team as a code owner November 14, 2024 13:57
main.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
@QuentinBisson
Copy link
Contributor

You're missing the toggle as well

@QuantumEnigmaa
Copy link
Contributor Author

You're missing the toggle as well

Oh indeed ! My bad

@QuantumEnigmaa
Copy link
Contributor Author

Created a separate PR for events-logger toggle : #270

@QuantumEnigmaa
Copy link
Contributor Author

I think I can't go further in terms of splitting this PR into smaller ones as now all files are intertwined between each other.

main.go Outdated Show resolved Hide resolved
main.go Show resolved Hide resolved
@QuentinBisson
Copy link
Contributor

Do not forget to test it when doing an upgrade to observability bundle 1.9.0 and when creating a new cluster with that version as well :)

@QuantumEnigmaa
Copy link
Contributor Author

QuantumEnigmaa commented Nov 19, 2024

Testing this on golem I get the following error :

{"level":"error","ts":"2024-11-19T15:04:31Z","msg":"Reconciler error","controller":"cluster","controllerGroup":"cluster.x-k8s.io","controllerKind":"Cluster","Cluster":{"name":"alloyeventstest","namespace"
:"org-giantswarm"},"namespace":"org-giantswarm","name":"alloyeventstest","reconcileID":"e30b77b5-9906-4111-a458-1f857982d0a4","error":"unsupported events logger \"\"","errorVerbose":"unsupported events lo
gger \"\"\ngithub.com/giantswarm/logging-operator/pkg/resource/agents-toggle.GenerateObservabilityBundleConfigMap\n [...]

So the error is at the observability-bundle configmap creation. The weird thing is that it doesn't print an actual events-logger name, only : \"\"

The logging-operator pod does have the right events-logger specified in its container though :

spec:                                                                                                                                                                                                       
  containers:                                                                                                                                                                                               
  - args:                                                                                                                                                                                                   
    - -enable-logging=true                                                                                                                                                                                  
    - -insecure-ca=false                                                                                                                                                                                    
    - -installation-name=golem                                                                                                                                                                              
    - -logging-agent=alloy                                                                                                                                                                                  
    - -events-logger=alloy                                                                                                                                                                                  
    - -default-namespaces=kube-system,giantswarm

@QuentinBisson
Copy link
Contributor

@QuantumEnigmaa you forgot to configure it here

loggedcluster.O.LoggingAgent = loggingAgent

@QuentinBisson
Copy link
Contributor

Let me know how the test goes :)

@QuantumEnigmaa
Copy link
Contributor Author

alloy-events deployed on the WC but the pod keep crashlooping with the following error message :

error: the server doesn't have a resource type "logs"

@QuentinBisson
Copy link
Contributor

Can you set grafana-agent as the default event agent and check that this still works?

@QuentinBisson
Copy link
Contributor

Expected tests

  • New cluster with Observability-bundle 1.8.0
    • With Alloy events enabled, make sure grafana agent is deployed and alloy is not
    • With Grafana-agent enabled, make sure grafana agent is deployed and alloy event is not
  • Cluster upgrade with Observability-bundle 1.8.0 upgrading to 1.9.0
    • With Alloy events enabled, make sure alloy is deployed and grafana agent is not
    • With Grafana-agent enabled, make sure grafana agent is deployed and alloy event is not
  • New cluster with Observability-bundle 1.9.0
    • With Alloy events enabled, make sure alloy is deployed and grafana agent is not
    • With Grafana-agent enabled, make sure grafana agent is deployed and alloy event is not

@QuentinBisson
Copy link
Contributor

k logs -n kube-system alloy-events-7d445b998-xlj6h

Error: /etc/alloy/config.alloy:3:30: expected {, got ,
interrupt received
Error: could not perform the initial load successfully

@QuentinBisson
Copy link
Contributor

QuentinBisson commented Nov 20, 2024

I think this is where is issue is in the templated config:

 {\n\tnamespaces = []\"kube-system\",
    \"giantswarm\"\"giantswarm\",
config.alloy: "\nloki.source.kubernetes_events \"local\" {\n\tnamespaces = []\"kube-system\",
    \"giantswarm\"\"giantswarm\", \n\tforward_to = [loki.write.default.receiver]\n}\n\n//
    Loki target configuration\nloki.write \"default\" {\n\tendpoint {\n\t\turl                =
    env(\"LOKI_URL\")\n\t\tmax_backoff_period = \"10m\"\n\t\ttenant_id          =
    env(\"TENANT_ID\")\n\n\t\tbasic_auth {\n\t\t\tusername = env(\"BASIC_AUTH_USERNAME\")\n\t\t\tpassword
    = env(\"BASIC_AUTH_PASSWORD\")\n\t\t}\n\n\t\ttls_config {\n\t\t\tinsecure_skip_verify
    = false\n\t\t}\n\t}\n\texternal_labels = {\n\t\tcluster_id   = \"alloyeventstest\",\n\t\tinstallation
    = \"golem\",\n\t}\n}\n\nlogging {\n\tlevel  = \"info\"\n\tformat = \"logfmt\"\n}"

@QuantumEnigmaa
Copy link
Contributor Author

I think this is where is issue is in the templated config:

Yeah, it shouldn't be rendered as this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants