As an admin/user I can maintain a controlled registry that doesn't continually grow in space #1268

ipanova · 2023-04-21T12:58:11Z

Is your feature request related to a problem? Please describe.
While for synced repos there is an option to sync content in a mirrored mode( not just additive), and then trigger orphan cleanup(still half solution). For pushed repos the content always gets accumulated.

Describe the solution you'd like
Introduce a concept of image garbage collection. Garbage collection will ensure efficient use of resources for active objects by removing objects that occupy sizeable amounts of disk space

Describe alternatives you've considered

set retain_repo_versions to 1 on the push repo
manually remove undesired images by digest. It will delete its tags and blobs(in case they are not referenced in other images)
Trigger orphan cleanup. Might need to adjust orphan_protection_time based on needs.

For synced repos one can also consider and combine the disk reclamation feature in case repo versioning is needed.

Additional context
Design and criteria of garbage collection - TBD
Disclaimer - this feature might be incompatible with repo versions, and might require, or will take into account to garbage collect only those repos that do not have versioning enabled aka repo_version_retention=1

mdellweg · 2023-04-21T13:42:05Z

When we implement #1212, I think we can move through all existing repository_versions and at least remove all the dangling blobs that got uploaded, but never added to a manifest. I know it is not exactly what this ticket is talking about, but also somehow related.

lubosmj · 2023-04-21T14:21:28Z

I imagine having an endpoint that will accept a list of repositories/tags (include/exclude_list) which should be preserved, and everything else will be purged.

Also, we can introduce the concept of "stale repositories". Repositories will be flagged as stale if they have not been hit by container clients for a longer period of time. Then, adjusted orphan cleanup will remove these repositories along with their content.

ipanova · 2023-09-01T12:40:45Z

ideas on tag retention policy https://goharbor.io/docs/2.2.0/working-with-projects/working-with-images/create-tag-retention-rules/

lubosmj · 2023-09-05T12:44:34Z

Nice find.

Based on the above comment, we can consider scheduling periodic cleanup tasks. We already have a facility for dispatching the periodic analytics task:
https://github.com/pulp/pulpcore/blob/b5dbcc0fda6a8b46b1ef8efe73cf71056fb67017/pulpcore/app/util.py#L298

(models.TaskSchedule.objects.update_or_create call in the post-migration hook)

From the scheduled task, we can traverse repositories or Pulp settings. If we allow users to set up a retention configuration that attaches to specific repositories, we could make the garbage collection more granular.

git-hyagi · 2024-06-19T17:14:12Z

Some notes/thoughts about this feature:

should we somehow control/block the other tasks to avoid deleting an artifact that would be gc'ed when a user was about to download/tag/use it (for example, gc running and a new image that uses the same layer about to be deleted being pushed)?
should we create a task scheduler and also allow manual execution (through a new django command or something like pulp orphan cleanup?!)?
the gc process should be handled by [non-blocking] worker pipeline tasks, correct? (or should it be a blocking task to avoid any issue while deleting the artifacts? or it should not even be a worker task at all?)
should we have a dry-run execution? (this could be helpful to have an idea of the number of artifacts eligible for deletion, which could also be helpful to have an idea of long gc task executions)
what about a gc task timeout? (for example, if the gc is running for more than GC_TASK_TIMEOUT the task should be aborted, this could be helpful to run the gc during a maintenance window)

ipanova added Feature Triage-Needed labels Apr 21, 2023

lubosmj mentioned this issue Jun 28, 2023

Add Pull-through caching #1299

Merged

4 tasks

lubosmj removed the Triage-Needed label Sep 5, 2023

lubosmj mentioned this issue Nov 9, 2023

As a user I have automatic syncs for repos so I can stay up-to-date with the remote registry #732

Open

lubosmj mentioned this issue May 23, 2024

As a user I can specify a retention policy #473

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

As an admin/user I can maintain a controlled registry that doesn't continually grow in space #1268

As an admin/user I can maintain a controlled registry that doesn't continually grow in space #1268

ipanova commented Apr 21, 2023

mdellweg commented Apr 21, 2023

lubosmj commented Apr 21, 2023

ipanova commented Sep 1, 2023

lubosmj commented Sep 5, 2023 •

edited

Loading

git-hyagi commented Jun 19, 2024

As an admin/user I can maintain a controlled registry that doesn't continually grow in space #1268

As an admin/user I can maintain a controlled registry that doesn't continually grow in space #1268

Comments

ipanova commented Apr 21, 2023

mdellweg commented Apr 21, 2023

lubosmj commented Apr 21, 2023

ipanova commented Sep 1, 2023

lubosmj commented Sep 5, 2023 • edited Loading

git-hyagi commented Jun 19, 2024

lubosmj commented Sep 5, 2023 •

edited

Loading