-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
delta-only repositories #729
Comments
I think this sounds good, as long as it properly falls back to the "from empty" delta if we're pulling from "not the next-to-latest" local version. |
(But we need some unit test coverage, and there's various enhancements one could make on top of this like being able to fall back to a separate archive repo for e.g. downgrades) |
Also, one thing occurs to me - we'd at least need to maintain the commit objects in the repo, otherwise prune would prune the deltas. |
does this issue cover the creation of unit tests for static delta only repos or do we need another ticket for that?
are we talking about the static delta only repo? wouldn't that get rid of the point of not having a bunch of small files in the repo? If we have a master repo where the small files and the static deltas live and then just create static delta only repos by copying content out of that repo then we don't need to worry about this correct? |
@cgwalters I'm kind of confused by this - what about a filesystem makes it unsuitable for storing/hosting an ostree repo? Is there a more effective backend from which you can store an ostree repo and serve it over http? Or do mirror operators simply dislike having lots of files around? |
So, I recently chatted with someone who was running an "app store" about how they implement authorized downloads. Basically what they do is serve the app files on a cdn like cloudfront, and then use a feature like cloudfront secure urls as documented here: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PrivateContent.html where they generate the final URL on their server where they know that the logged in user is allowed to download a particular app. The secure URL has a lifetime of 30 seconds and is signed on the server, so the client doesn't have to care and can just download the thing. In the context of ostree we could do the same thing if we had a delta-only repo on a cdn:
Cloudfront allows you to use cookies for this, but it seems some other CDNs only support http params, so maybe ostree should have a feature similar to --http-header that adds a http param to all urls. |
@jlebon, @cgwalters, @sinnykumari and I were discussing 'delta-only' repos today. One thing @jlebon brought up was:
|
I've been discussing this stuff with @alexlarsson a lot in the context of Flathub. At one point, the flathub stats were showing each download (whether an upgrade, or a new pull) was averaging 1GB of data transferred - but this was during a period that when ostree didn't see a matching delta it would pull the scratch delta instead of doing an object pull (madness, later resolved). A delta-only repo is basically re-instating this: mirrors are great and everything, but are a far less relevant way of distributing files than modern caching/proxying CDNs. BunnyCDN (for Endless) and Fastly (for Flathub) work a-OK for ostree repos, and you can easily tune the caching to keep the immutable objects around for ~ever, have short timeouts / explicit purges, its pretty easy to cache ostree repos in CDNs, and the hit rate is superb (>97% in both cases I have access to, likely the two largest production ostree repos at present). So: what problem is really being solved here? When you look at your CDN bill, or the time and data it costs at the client to have a very limited version of things on the server, I'm really not convinced that unless we make deltas heaps smarter, that a delta only repo is a benefit for clients. It makes mirroring easier, yes - because you have maybe one or a couple of delta folders per ref - but most people don't have a mirror network, so I think it represents a net loss for the bandwidth efficiency of the client, unless we:
|
Right: #1709 |
Oh yeah! What I said back then. tl;dr - deltas are an amazing technical advantage of ostree, and (modulo bringing any repo server to its knees when generating them on large files) incredibly smart and bandwidth efficient, but they totally fail to deliver on that promise due to how they are currently deployed and managed. Let's make repo the management tools, ostree/flatpak/repo-manager smarter before we force that ineffectual deployment cost onto our downstream mirrors and every end user by flipping a delta-only bit and not solving the real problem. :) |
We (FCOS) are discussing this in the context of this issue which links to this MirrorManager one. A concern some people have is tying ourselves solely to a CDN. |
This is the answer you get if you ask mirror operators, of course. :) Provide an OCI image which just opens a caching front-end, and you can deploy your own grass-roots CDN with a geoIP or round robin frontend. Setting low TTLs or issuing PURGE is pretty easy after a summary update. I think if you "solve" this problem (making life easier for mirror operators) it will make things worse for users and undo eg work on delta RPMs etc.
|
for me, I'm not as concerned with tying ourselves to CDN. We've been using a CDN for our ostree repo for a little while now and people still complain about slow download speeds and timeouts all the time. So we either have things configured badly or things are getting cycled out of the cache too fast. See also #1541 where we were discussing one optimization (i.e. the many redirects might be what is slowing down the downloads). If we can get a good CDN "answer" then i'd be fine with that too |
Oh! Yeah redirects absolutely rinse the performance of whatever pipelining ostree is doing - at least I've definitely seen that at some point early in Flathub's life - that's why we set up dl.flathub.org as a separate hostname for repo access only. You have to point the origin in ostree to the hostname and path served by the CDN - you could probably finesse that with a mirrorlist of one in ostree. I am almost certain that any Flathub issues are all due to load on the origin server rather than any problem with the CDN. Debian for instance has two CDNs (CloudFront and Fastly) and pays for neither - for Flathub we got Fastly basically by me tweeting, and it wasn't the only offer we received, just one of the best CDNs so I didn't spend much time with the others. |
https://gist.github.com/ramcq/a3991b5834767c6da73eec1af08b52ab is how the origin is configured on Flathub, fwiw. |
In the Fedora/CentOS case where by default we rely on e.g. university-owned mirrors that might be some random ext4 server and not a proper object store, we can hit performance issues with the archive format.
It should be quite possible to make it easier for server operators to manage a "delta-only" repository. See also: #701
So it's delta-only + single "from empty" delta for the latest.
I think it'd be possible to cobble this together today via
ostree static-delta generate --min-fallback-size 100000
for each delta you want, thenostree summary -u
, then sync thesummary
anddeltas/
content to the "delta repo".The text was updated successfully, but these errors were encountered: