Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chore: Figure out appropriate requests and limits for Claudie services #935

Closed
3 tasks
katapultcloud opened this issue Jul 4, 2023 · 6 comments · Fixed by #1055
Closed
3 tasks

Chore: Figure out appropriate requests and limits for Claudie services #935

katapultcloud opened this issue Jul 4, 2023 · 6 comments · Fixed by #1055
Assignees
Labels
chore A chore is updating dependencies, etc; no significant code changes groomed Task that everybody agrees to pass the gatekeeper

Comments

@katapultcloud
Copy link
Contributor

Description

Requests and limits should be adjusted as they seems to take way more than they actually need causing overprovisioning of hardware. I've viewed consumption of requests of each service (some not included) in GKE observability console and this is what I came up with.

Memory

  Util Recommended Current
ansibler 0.8% 100Mi 768Mi
builder 1.41% 50Mi 200Mi
dynamodb 12% 200Mi 512Mi
kube-eleven 1% 100Mi 500Mi
kuber 17% 100Mi 200Mi
mongodb 68% stays 300Mi
terraformer 0.6% 200Mi 1200Mi

CPU

  Util Recommended Current
ansibler 0.1% 100m 700m
builder   stays 80m
dynamodb   stays 100m
kube-eleven 0.02% 100m 500m
kuber 0.02% 50m 300m
mongodb 6% stays 100m
terraformer 0.03% 100m 700m

However, the statistics in GKE console are not great and I'd like to monitor services for some time 1-2 weeks before setting these in stone.

Exit criteria

  • Install kube metrics and prometheus
  • observe for 1-2 weeks
  • set requests and limits accordingly taking spikes into account
@katapultcloud katapultcloud added the chore A chore is updating dependencies, etc; no significant code changes label Jul 4, 2023
@MarioUhrik
Copy link
Contributor

MarioUhrik commented Jul 4, 2023

Have these recommendations taken spikes into account ?

We've had several rounds of requests/limit tuning already, and there's a reason why they are roughly as you've found them

@katapultcloud
Copy link
Contributor Author

@MarioUhrik I don't trust GKE observability tooling to provide the accurate stats including spikes, I think they average them out quite aggressively. At the bottom I mentioned what should be the correct steps. The tables show what I would consider with GKE tooling, but without further investigation using kube metrics and prometheus we should not proceed.

@MarioUhrik
Copy link
Contributor

Sounds good, thanks

@katapultcloud
Copy link
Contributor Author

  • monitor e2e cluster for 1-2 weeks
  • monitor mgmt cluster for 1-2 weeks

@katapultcloud katapultcloud added the groomed Task that everybody agrees to pass the gatekeeper label Jul 7, 2023
@JKBGIT1 JKBGIT1 self-assigned this Jul 12, 2023
@JKBGIT1
Copy link
Contributor

JKBGIT1 commented Aug 22, 2023

There are some gathered data from monitoring stack, when the pipeline on e2e cluster ran.

Memory biggest spikes for last 24h

day ansibler autoscaler builder dynamodb kube-eleven kuber mongodb terraformer
2.8.2023 ~600MiB ~60MiB ~10MiB ~120MiB ~100MiB ~100MiB ~240MiB ~1.45GiB
7.8.2023 ~605MiB ~73MiB ~10MiB ~128MiB ~100MiB ~100MiB ~230MiB ~1GiB
14.8.2023 ~600MiB ~75MiB ~12MiB ~120MiB ~100MiB ~190MiB ~250MiB ~1.12GiB
18.8.2023 ~650MiB ~60MiB ~17MiB ~110MiB ~140MiB ~120MiB ~200MiB ~1.13GiB

CPU biggest spikes for last 24h

day ansibler autoscaler builder dynamodb kube-eleven kuber mongodb terraformer
2.8.2023 ~1400m - ~1.6m ~35m ~160m ~350m ~14m ~1300m
7.8.2023 ~1100m - ~1.2m ~14m ~95m ~210m ~10m ~1140m
14.8.2023 ~1560m ~27m ~4m ~180m ~246m ~476m ~54m ~1060m
18.8.2023 ~1610m ~19m ~1.75m ~140m ~180m ~470m ~95m ~1520m

Based on the spikes I have proposed some requests and limits changes, but I am not sure, whether they are relevant.

CPU

ansibler autoscaler builder dynamodb kube-eleven kuber mongodb terraformer
curr request 700m 100m 80m 100m 500m 300m 100m 700m
curr limit 1024m 100m 160m 200m 700m 500m 150m 1024m
new request 1100m 50m 5m - 250m - - 1024m
new limit 1500m 75m 10m - 350m - - 1500m

Memory

ansibler autoscaler builder dynamodb kube-eleven kuber mongodb terraformer
curr request 500Mi 300Mi 200Mi 512Mi 150Mi 200Mi 300Mi 1024Mi
curr limit 900Mi 300Mi 400Mi 1Gi 300Mi 400Mi 500Mi 1200Mi
new request 600Mi 80Mi 15Mi 120Mi 120Mi 150Mi 250Mi -
new limit 750Mi 120Mi 25Mi 150Mi 180Mi 250Mi 450Mi 1500Mi

see tables in excel

@JKBGIT1
Copy link
Contributor

JKBGIT1 commented Sep 29, 2023

We have discussed new requests and limits with @katapultcloud and @cloudziu on a call. You can see them in table below. BTW we have decided to remove limits on CPU and keep only requests.

Based on the spikes I have proposed some requests and limits changes, but I am not sure, whether they are relevant.

CPU

ansibler autoscaler builder dynamodb kube-eleven kuber mongodb terraformer
new request 200m - 10m 100m 100m 100m 100m 200m
new limit - - - - - - - -

Memory

ansibler autoscaler builder dynamodb kube-eleven kuber mongodb terraformer
new request 600Mi 100Mi 15Mi 120Mi 120Mi 100Mi 200Mi 1200Mi
new limit 800Mi 120Mi 25Mi 200Mi 160Mi 200Mi 300Mi 1500Mi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore A chore is updating dependencies, etc; no significant code changes groomed Task that everybody agrees to pass the gatekeeper
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants