Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collector doesn't clean freight #3572

Open
4 tasks done
semenar-0 opened this issue Feb 26, 2025 · 11 comments
Open
4 tasks done

Garbage collector doesn't clean freight #3572

semenar-0 opened this issue Feb 26, 2025 · 11 comments

Comments

@semenar-0
Copy link

semenar-0 commented Feb 26, 2025

Checklist

  • I've searched the issue queue to verify this is not a duplicate bug report.
  • I've included steps to reproduce the bug.
  • I've pasted the output of kargo version.
  • I've pasted logs, if applicable.

Description

garbage collector doesn't clean freights. We already have 1029 freights, some of them older than 40 days.
It is breaks the UI.

Screenshots

Steps to Reproduce

garbageCollector:
  enabled: true
  schedule: "*/5 * * * *"
  workers: 5
  maxRetainedPromotions: 5
  minPromotionDeletionAge: 168h
  maxRetainedFreight: 10
  minFreightDeletionAge: 72h
  logLevel: DEBUG
  labels: {}
  annotations: {}
  podLabels: {}
  podAnnotations: {}
  resources:
    limits:
      memory: 128Mi
    requests:
      cpu: 100m
      memory: 128Mi
  nodeSelector: {}
  tolerations: []
  affinity: {}
  securityContext: {}
  env: []
  envFrom: []
$ kubectl get freight | wc -l
1022
$ kubectl get pods
NAME                                           READY   STATUS      RESTARTS   AGE
kargo-api-7b5c879bc6-52n5l                     1/1     Running     0          29d
kargo-controller-5df95df976-bx69x              1/1     Running     0          30d
kargo-garbage-collector-29009465-lg6l7         0/1     Completed   0          13m
kargo-garbage-collector-29009470-lttb4         0/1     Completed   0          8m43s
kargo-garbage-collector-29009475-qbp7d         0/1     Completed   0          3m43s
kargo-management-controller-57db5d4d4b-g8mjd   1/1     Running     0          30d
kargo-webhooks-server-5f9d59b468-pglhc         1/1     Running     0          29d

Version

1.2.0

Logs

- kargo-garbage-collector-29009470-lttb4 › garbage-collector
kargo-garbage-collector-29009475-qbp7d garbage-collector time="2025-02-26T11:15:00Z" level=info msg="Starting Kargo Garbage Collector" GOMAXPROCS=2 GOMEMLIMIT=134217728 commit=0958769ad8df56d274bc752ce236a257f2920f64 version=v1.2.0
kargo-garbage-collector-29009475-qbp7d garbage-collector time="2025-02-26T11:15:00Z" level=debug msg="loading in-cluster REST config"
kargo-garbage-collector-29009475-qbp7d garbage-collector time="2025-02-26T11:15:01Z" level=debug msg="cleaned Promotions to Stage" project=workflows stage=dev
kargo-garbage-collector-29009475-qbp7d garbage-collector time="2025-02-26T11:15:01Z" level=debug msg="cleaned Promotions to Stage" project=workflows stage=qa
kargo-garbage-collector-29009475-qbp7d garbage-collector time="2025-02-26T11:15:01Z" level=debug msg="cleaned Promotions to Stage" project=workflows stage=uat
kargo-garbage-collector-29009475-qbp7d garbage-collector time="2025-02-26T11:15:01Z" level=debug msg="cleaned Freight from Warehouse" project=workflows warehouse=jobrunner-dev
kargo-garbage-collector-29009475-qbp7d garbage-collector time="2025-02-26T11:15:01Z" level=debug msg="cleaned Freight from Warehouse" project=workflows warehouse=jobrunner-rc
kargo-garbage-collector-29009475-qbp7d garbage-collector time="2025-02-26T11:15:01Z" level=debug msg="cleaned Project" project=workflows
- kargo-garbage-collector-29009475-qbp7d › garbage-collector
@krancour
Copy link
Member

krancour commented Feb 26, 2025

@semenar-0, we will attempt to replicate this when we have a chance, but in the meantime, we just want to confirm you understand the eligibility requirements for being garbage collected and that you are certain those requirements are met.

The max of 10 you have configured is an ideal max retained beyond the oldest piece of Freight (from each Warehouse) still in use.

There are a variety of scenarios where conditions can be blocking Freight from being GC'ed.

Two examples:

  1. Assuming all 1022 Freight resources are from the same Warehouse, we will for illustrative purposes, number those 1022 Freight resources as 0 - 1021, with 0 being the oldest and 1201 being the newest. If any of 0 - 9 were still in use, it would explain all of those 1022 not having been GC'ed.

  2. If the 1022 Freight resources came from many different Warehouses, that also could be a factor. Say you could number the Freight from 11 different Warehouses as 0 - 100. (This is 1100 Freight resources actually. I'm just trying to keep the math easy.) For each of those 11 groups, if anything in the 0 - 9 range is still in use, it could be stopping many other Freight resources in its group from being GC'ed.

There are probably a lot more scenarios that could explain this. We are still happy to try and reproduce this when time permits, but as I said... confirming that nothing like the above explains away your issue would be helpful to us.

@semenar-0
Copy link
Author

semenar-0 commented Feb 27, 2025

@krancour There are 2 warhouses. Most of the freights are not in use.
`~ kargo get freight --project=workflows -o json | jq -r '.items[] | select(.status.currentlyIn | length == 0) | .metadata.name' | wc -l

1045
~ kargo get freight --project=workflows -o json | jq -r '.items[] | select(.status.currentlyIn | length > 0) | .metadata.name' | wc -l

5`

per warehouse:
`kargo get freight --project=workflows -o json | jq -r '.items[] | select(.status.currentlyIn | length > 0) | .origin.name' | sort | uniq -c

  3 jobrunner-dev
  2 jobrunner-rc`

`kargo get freight --project=workflows -o json | jq -r '.items[] | select(.status.currentlyIn | length == 0) | .origin.name' | sort | uniq -c

894 jobrunner-dev
151 jobrunner-rc`

@krancour
Copy link
Member

Thanks @semenar-0! Have you checked that none of the Freight still in use are among some of the oldest? Like I said, very old Freight that's still in use can be blocking a large number of Freight from being GC'ed. Just want to rule this out.

@semenar-0
Copy link
Author

@krancour Here is the output of the unused Freight. Currently, only five are in use.

@krancour
Copy link
Member

krancour commented Mar 3, 2025

@semenar-0 you haven't answered the question. Have you verified that none of the Freight in use are among the oldest? If that were so, they could be blocking anything newer than themselves from being GC'ed.

@semenar-0
Copy link
Author

semenar-0 commented Mar 6, 2025

@krancour that what i have checked "select(.status.currentlyIn | length > 0)"

based on output only 5 freights of 1029 are in use currently

with unused freights collapsed:
Image

@krancour
Copy link
Member

krancour commented Mar 6, 2025

based on output only 5 freights of 1029 are in use currently

@semenar-0 I think you were still not understanding my question.

I was asking about the age/timestamps of the Freight that are currently in use.

The screen shot, however, tells me what I need to know.

coy-marsupial appears to be your oldest Freight and it is still in use.

As I've previously explained:

The max of 10 you have configured is an ideal max retained beyond the oldest piece of Freight (from each Warehouse) still in use.

There are a variety of scenarios where conditions can be blocking Freight from being GC'ed.

Two examples:

  1. Assuming all 1022 Freight resources are from the same Warehouse, we will for illustrative purposes, number those 1022 Freight resources as 0 - 1021, with 0 being the oldest and 1201 being the newest. If any of 0 - 9 were still in use, it would explain all of those 1022 not having been GC'ed.
    ...

Based on what you've shown me, there is no bug here and GC is working exactly as it is meant to.

@semenar-0
Copy link
Author

@krancour
If we have a stage (e.g., prod) where promotions are triggered manually, does this mean Kargo will continue creating new freights indefinitely? How can we prevent this behavior?

let's say that from the same warehouse, we continuously promote to QA stage.

@krancour
Copy link
Member

krancour commented Mar 6, 2025

I'm not sure I understand your question. Warehouses find artifacts and "package" them as Freight. They do this continuously and their behavior is unaffected by however the Stages are configured.

@semenar-0
Copy link
Author

@krancour, in some environments, we don’t have automatic promotion enabled, meaning that an older freight might always remain in use.

If I understand correctly, in this scenario, the garbage collector won’t remove newer freights because the older one is still referenced. This is causing a performance issue due to the accumulation of freights.

Could you clarify if this is the expected behavior? And if so, what can we do to ensure that newer freights are properly cleaned up while still keeping the necessary ones?

@krancour
Copy link
Member

krancour commented Mar 6, 2025

in some environments, we don’t have automatic promotion enabled, meaning that an older freight might always remain in use.

This is common. What is not common is that the oldest Freight still in use is thousands of generations old.

Could you clarify if this is the expected behavior?

It is expected.

And if so, what can we do to ensure that newer freights are properly cleaned up while still keeping the necessary ones?

Zeroing in on the phrase "keeping the necessary ones," Kargo has no way of knowing what unused Freight you consider necessary. GC'ing nothing newer than the oldest Freight still in use is a conservative scheme for not GC'ing something you may actually care about.

If this doesn't work for you, you can implement your own GC component.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants