-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement logging infrastructure #311
Comments
The visibility is also interesting for customers. Similar to #182 |
via @anvddriesch: With CAPI a better interface towards distributed logging becomes more important. There are many controllers in the CAPI implementations compared to the single operator we use today. So in case you want to debug what is going on, eg during cluster creation, it would be good to have the logs in one place. |
as a requirement (coming from https://github.com/giantswarm/giantswarm/issues/11489) we want the k8s audit log kept |
Status update
|
Note from sig-product sync and further async discussions
|
@TheoBrigitte imo Loki shouldn't be a managed app to monitor applications. Instead we need to build centralized logging for the platform that supports the infrastructure and applications. And this is the same for Prometheus. And it needs to be scoped by management cluster. I am sorry that you started working on centralized logging. But this is not an option. I could have told you that before. Imo we need to get better to communicate decisions like that. This is quite an impactful decision that you took in the team and this is not aligned with our general architecture. Examples why this doesn't work:
So from my side centralized logging is no option. I am happy to talk about this on Tuesday, but please don't invest any time into the centralized solution anymore. |
The recent discussions around our logging architecture, about the centralized vs distributed approach pro and cons have been summarized into this RFC. We are now working on setting up Loki following the distributed approach. |
The currently plan to install Loki is explained in the RFC but we forgot to talk about Promtail the log ingester. After discussions with the team we think the following idea is the safest bet. MotivationGetting a new app or config change deployed to a management cluster is rather straightforward if the application is deployed through an app-collection (creating a release and voila) but deploying the app or the config change to a workload cluster is rather tedious (and that's without taking into account the need for a customer's approval of the change) causing us pain (opened postmortems, silenced alerts that are already fixed and so on) IdeaOur idea is to deploy the promtail app as part of the observability bundle with promtail disabled by default today (creating a new release before we create new Vintage releases (namely aws 18.2.0 and azure 19.0.0)) with promtail being disabled. This operator will be in charge of creating/updating/deleting the configmap for each cluster so we can dynamically update the promtail config on each cluster (management cluster and workload cluster alike). This will allow us to configure the application at runtime (feature flagging per MCs and so on). This operator will also be used to implement multi-tenancy (cluster or organization level remains to be seen) without having to ask customers to upgrade again. |
Added TODOs about Promtail deployment that were discussed in last refinement session. |
Status updateWe have a good progress on the dynamic configuration part, which we need in order to configure Grafana, Promtail, and the multitenant-proxy to make logs flow through our infrastructure. There are some last bit and pieces (more details here https://github.com/giantswarm/giantswarm/issues/27146) plus some testing to be done.
|
Status update
|
Status update
|
We are almost there, we are only missing https://github.com/giantswarm/giantswarm/issues/28726 so technically https://github.com/giantswarm/giantswarm/issues/29776 to be done here |
User Story
Tasks
https://github.com/giantswarm/giantswarm/issues/22335promtail
#722promtail
#722Related
The text was updated successfully, but these errors were encountered: