Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a curated list of priority classes for Giant Swarm workload #3483

Open
2 tasks
QuentinBisson opened this issue Jun 5, 2024 · 8 comments
Open
2 tasks

Comments

@QuentinBisson
Copy link

We came to the conclusion in @giantswarm/sig-architecture that we need to define a list of curated priority classes for giant swarm workload so that we do not have too many (let's not have 1 per app like we have right now with crossplane and flux), but enough to make sure that highly critical (kyverno, prometheus-agents), critical (promtail) and a bit less critical components (fluent-bit) are scheduled with more priority than other workloads.

We currently have the following classes:

Workload clusters

NAME VALUE GLOBAL-DEFAULT

  • giantswarm-critical 1000000000 false
  • system-cluster-critical 2000000000 false
  • system-node-critical 2000001000 false

Management clusters

NAME VALUE GLOBAL-DEFAULT
crossplane-critical 600000000 false
flux-giantswarm-flux-giantswarm 1000000000 false
giantswarm-critical 1000000000 false
prometheus 500000000 false
system-cluster-critical 2000000000 false
system-node-critical 2000001000 false

Goals of this issue is to:

  • Establish a curated set of priority classes for all apps
  • Ensure we have the same priority classes on management and workload clusters and that they are deployed the same way (giantswarm-critical being deployed by the chart-operator)

@giantswarm/sig-architecture Do you have the a list of components that we run in CAPI clusters so I can create a table with their priority classes?

@QuentinBisson QuentinBisson self-assigned this Jun 5, 2024
@architectbot architectbot added the team/atlas Team Atlas label Jun 5, 2024
@piontec
Copy link

piontec commented Jun 6, 2024

OK, so it seems flux-giantswarm should just use giantswarm-critical. I think we need something like ginatswarm-high, lower prio than critical, but still for, well, important stuff, like crossplane.
Maybe to get started we keep the 2 system*, as they don't really apply to "normal" apps, I believe, but really critical system components. Then, for important apps, we could use something like:

  • giantswarm-critical
  • giantswarm-very-high
  • giantswarm-high

WDYT?

@QuentinBisson
Copy link
Author

I would be more enclined to go with:

  • giantswarm-critical
  • giantswarm-high
  • giantswarm-medium

That way we can add giantswarm-low if we need to. I'm not sure I would add things inbetween those.

We could instead use the following (yes the migration effort will take time so we could have giantswarm-critical = giantswarm-high):

  • giantswarm-high
  • giantswarm-medium
  • giantswarm-low

Now I'm not sure what component would be in each though. Are we fine with prometheus being in the high priority ?

@QuentinBisson
Copy link
Author

It seems this issue does not have a lot of tractions @piontec :D

@QuentinBisson
Copy link
Author

@giantswarm/sig-architecture does anyone have thoughts on this? I don't think it's a really huge topic :)

@JosephSalisbury
Copy link
Contributor

@QuentinBisson want to throw this on the sig arch agenda for next week, and we can just thrash it out?

@QuentinBisson
Copy link
Author

Hey i'm always up for closing issues

@JosephSalisbury
Copy link
Contributor

JosephSalisbury commented Sep 25, 2024

from sig architecture:

we're generally fine with:

  • system-cluster-critical - upstream cluster critical
  • system-node-critical - upstream node critical
  • giantswarm-critical - stuff that the cluster absolutely requires to run and giant swarm is adding (i.e: kyverno)
  • giantswarm-high - stuff that should preempt customer workloads

next steps:

  • define delivery for priority classes
  • move apps that use priority classes to use our new priority classes (and get rid of custom priority classes)
  • look over all apps and see if they should use a priority class (i.e: if they don't already)

@JosephSalisbury
Copy link
Contributor

hola @yulianedyalkova - we had a bit of a chat about this in sig architecture, i reckon it makes most sense for tenet

there's basically no urgency on it (atlas is unblocked from their original issue), it just would be good to clean up these priority classes eventually. we had some ideas from sig arch above, but those can be entirely changed depending

plz ping me here or on slack if you have any other questions <3

@JosephSalisbury JosephSalisbury added team/tenet Team Tenet and removed team/atlas Team Atlas labels Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog 📦
Development

No branches or pull requests

4 participants