Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce staging binary cache to reduce our binary cache size #492

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Mic92
Copy link
Member

@Mic92 Mic92 commented Oct 15, 2024

The idea is to have buckets to test VCL configuration on them.
In particular I want to create a new S3 bucket that we would start pushing to with hydra.
The VCL rules in fastly would check both buckets for the migration period.
After we have migrated the part of the binary cache that we are interested,
we make the VCL rules only hit the new bucket.
After that we add an archiving rule to transition the old bucket to aws glacier: https://aws.amazon.com/blogs/aws/archive-s3-to-glacier/

@Mic92 Mic92 requested a review from a team as a code owner October 15, 2024 14:28
@edolstra
Copy link
Member

What's this migration all about?

@Mic92 Mic92 changed the title Introduce staging binary cache to migration Introduce staging binary cache to reduce our binary cache size Oct 16, 2024
@Mic92
Copy link
Member Author

Mic92 commented Oct 16, 2024

@edolstra to reduce AWS costs on our binary cache.

@Mic92 Mic92 force-pushed the staging-buckets branch 3 times, most recently from 2cc7764 to 22bda4c Compare October 16, 2024 09:16
@Mic92 Mic92 force-pushed the staging-buckets branch 4 times, most recently from be7110b to b4177a6 Compare October 16, 2024 14:41
@edolstra
Copy link
Member

@Mic92 Yeah but why do we need a separate bucket for that? The way I imagine doing GC is that the GC process would set a tag on objects that we consider garbage, and then we can have a lifecycle rule that moves them to Glacier. As an intermediate step, we could have an access control rule that only makes them inaccessible (so if we screw up, we can restore them more cheaply than from Glacier).

@Mic92
Copy link
Member Author

Mic92 commented Oct 16, 2024

@edolstra sounds easier actually. I thought it might be also expensive to run api requests on all objects. But it looks like it's still doable:

Let's say we need 1-2 requests per object it would be ({1,2} * 889587727) / (1e6) = 889.5 -> 1779.1 Dollar
What would be the best way to find all the objects to keep?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants