-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching mirrors #304
Comments
That might, if I could find where it's set. I've looked in both my site and host configuration and don't see that option. |
It is an admin only option. |
Admin for the [private] mirror? I should be admin for my own private mirror. I can change many other aspects of my site and host. |
Ah, private mirror. Right. No, it is a mirrormanager instance admin only option. For private mirrors you cannot set it yourself. It doesn't sound problematic for private mirrors but currently it is not implemented. |
So, back to original problem description. :-) |
You could fake it by creating the directory tree with empty directories. |
Indeed. But what should all be in such an empty directory tree? And it does change from release to release (i.e. the introduction of This issue is all about resolving this problem given those constraints. |
I suppose one could do something like:
To mimic the directory structure. Just sucks that this needs to be such a manual operation on every new release of a distro. |
Mirror servers that are implemented as caching HTTP proxy servers would require proper HTTP caching metadata in headers from upstream servers or manual rules to be effective. So Fedora would be required to have mirror servers that serve their files with proper caching metadata, for example, |
I'm not sure what you are implying or trying to say @ott but I can tell you, from lots of experience -- both professional (i.e. |
@brianjmurrell I have also worked with HTTP caching semantics. I can assure you by experience and with reference to the HTTP specifications that what you are claiming is generally speaking not true. HTTP caching semantics are specified in RFC 9111. A list of conditions to cache a response is found in section 3. There might be exceptions but the mirror that I have seen use the default configuration of their webserver and do not serve files with headers that relevant for HTTP caching except for the Last-Modified header for GET requests. As a result, section 4.2.2 applies. I can assure you that you never want that a HTTP cache uses heuristic freshness. It leads to caching anomalies in web browsers and can break websites. While dnf does not seem to apply heuristic freshness or any sophisticated caching, HTTP reverse proxy servers might. So you have to configure a HTTP reverse proxy server to always validate responses if not told otherwise by the origin server. So a caching HTTP reverse proxy server would have to check with help of the E-Tag and Last-Modified header whether the request resource has changed on the origin server. In most cases this will not be the case and the origin server will return a response with HTTP status code 304. However, this does not relief the origin server in the same way as a second origin server with the same data would: It still has to access the filesystem and given that most packages are quite small the file contents could have been read in the same filesystem access, especially with hard disks. It does reduce the data transfer volume however. For large files this might be a worthwhile trade-off between simplicity, freshness and resource savings. For many small files it is not such an easy decision. As most files are never changed on a mirror, it would be possible to use different HTTP caching semantics, for example, the Cache-Control header. Unfortunately, a package that is declared to be immutable or has a long expiration time, cannot be easily revoked or retracted. So if a package would need to be retracted, for example, if it was considered malware or otherwise illegal or would violate the distributions statutes, it would be difficult to impossible to do so and certainly a tedious manual process. So I don't think that it would be a good idea. It might be would be possible to periodically clean the cache and to cache repository metadata immediately or after the cache has been cleared to make a best-effort to mimic the behaviour of mirror servers that are periodically synchronized with rsync. Another possibility would be to validate the packages based on repository metadata and to remove invalid packages from the cache. However, both possibilities are specific to package managers and distributions and I can image that there might be tricky corner-cases. |
@ott I don't doubt your experience. But I also don't doubt my literally decades of experience with running RPM/DEB mirror caching proxies with great success. You can continue to tell me that it won't work, but again, I have decades of experience with them telling me it does work. To be perfectly clear, I am not referring to using generic HTTP proxies like Squid Cache but purpose-built RPM/DEB (and potentially more package formats) proxy caches like Nexus Repository Manager, Artifactory and even the much more lightweight pkg-cacher and/or AptProxy. |
As I said, specialized HTTP caches that can use repository metadata might work but you did not mention such software in your statement. I can't comment on whether the mentioned software works correctly tough. My experience is limited to apt-cacher-ng and I'm not even sure that it works correctly. Moreover, software that is written in an interpreted language and that has to serve every request, might not be suitable to high-volume mirrors. |
I didn't need to. I didn't come here to report problems with my caching mirror software so it was not relevant. The discussion of such is only because you started positing that it doesn't/won't/can't work, quite OT for this issue I might add.
I never said I was operating a high-volume mirror with an interpreted language solution . Quite the opposite in fact as the solution that I operate with an interpreted language tool is a private mirror for a small network, so it works sufficiently well. While I appreciate your perspective on proxy/caching RPM repos, given that I am doing what I am doing with proxy/caching repos quite successfully and have for many many years, you are not going to convince me that it can't work and that I should stop doing it. |
It was not my intention to tell you how you should operate your private mirror servers. I also did not mean to criticize you. Your first comment in this issue did not mention that your feature request is limited to your private mirror servers. So I was trying to highlight some problems that could result from people running public mirror servers that are actually caching HTTP reverse proxy servers. My only goal was to prevent problems for Fedora that could result from giving people the means to operate caching HTTP reverse proxy servers as mirror servers without understanding why this is commonly not done (at least for public mirror servers) and is not as easy as it might seem. However, if the scope of this issue is now limited to private mirror server only, I can also unsubscribe from it and not interfere or bother here. I think have also said more or less what can be said about this topic. |
There are tools that allow one to mirror on demand as a caching proxy mirror. Such mirrors of course are virtually complete but in reality only as complete as the clients that use it have requested files/packages from it.
The problem with this is that
report_mirror
may report only a partial listing, but the reality is that the mirror is complete from the perspective of it's clients.It would be useful if mirrormanager and/or
report_mirror
could be informed of this so that partialreport_mirror
s don't make invalidate the mirror. In the short-termreport_mirror
could be modified to fake the output of a full mirror, for example.The text was updated successfully, but these errors were encountered: