-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP/S buildfetch
streams in chunks
#3598
Conversation
Hi @mtalexan. Thanks for your PR. I'm waiting for a coreos member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me. @jlebon @ravanelli - WDYT?
|
||
# If the HTTP headers have encoded the file transfer as chunks already, respect those instead | ||
# of our hardcoded max size. | ||
if 'chunked' in r.headers.get('transfer-encoding', list()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something you copied from elsewhere? Is it best practice?
Does any of this really matter for us versus just calling iter_content(chunk_size=None)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code was inspired by this StackOverflow answer and its comments: https://stackoverflow.com/a/16696317/388082
Reading some of the requests
documentation though, the code block on StackOverflow is all but an example straight from the documentation.
According to the documentation, if you set stream=True
in the get request, it allows you to use iter_content()
. The parameter chunk_size
is technically optional, but setting it to None
means to use whatever the server said for the chunk sizes. If the server didn't decide to do chunk encoding, it will give you the same result as not using iter_content()
at all (the very problem I'm trying to solve).
So the logic here is setting a variable to a sane default chunk size, then if the encoding of the reply from the server indicates it is using chunked encoding, we reset to None
so the server-side chunk sizes are used instead.
"Best practice" would seem to be to use urllib
instead of the lower level requests
module. While I can't personally confirm one way or another, the urllib.urlretrieve()
function is designed to do exactly what is being done here in an optimized way. I don't know if it has an equivalent of the exists_impl
functionality though, which might be why requests
was used here instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a lot of sense to me. We've actually had OOM crashes from cosa buildfetch
too in the past in the pipeline (though there we use the S3 APIs, which when I last dug into it I think did do chunking but I'll have to recheck). Though we do also use the HTTPFetcher
in some places which might be helped by this. Thanks for the contribution!
Minor nit: can you squash the flake8 fixup into the main commit?
…ks, or declared chunk encoding to reduce RAM usage
6e1d1c7
to
2d7e8c5
Compare
Done. |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@mtalexan: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
I'm pretty sure rhcos CI doesn't cover this case. Merging... |
A minor procedural note that I didn't notice before merge; you have text content in the PR but not in the commit message. In most of our repositories we try hard to follow modern Git conventions. Instead of:
I would have written e.g.:
Notice the "topic prefix". I also dropped the "in 30 MiB chunks" from the title as it's an unimportant implementation detail. And I repeat the term |
Keep RAM usage down when using
buildfetch
. Existing code was downloading the entire file into RAM before writing it to disk, but now that ociarchive files are included that are usually 4-8 GiB that's no longer a good idea. Lower-RAM systems choke on having too little RAM available to continue operating normally.This instead sets a somewhat arbitrary 30 MiB chunk limit. If the server itself is already chunk encoding it will use whatever that says instead, but in a lot (most?) cases the server probably isn't.
It's confirmed to fix a problem with downloading 4 GiB ociarchive files as part of a
buildfetch
on a desktop system with only 8 GB of RAM. The system was crashing every time during the download due to the OOM killing it for eating too much of the available RAM. We ideally want the system to be able to do builds with ociarchives of up to 8 GiB.Effect on download performance for existing uses should be relatively negligible. It might slightly impact download time if you have a really huge network pipe, lots of RAM, slow disk IO, and small disk IO cache, but otherwise it won't affect much.
In one timing test on a system with a 1 Gbps network pipe, an NVMe disk, 64 GB of RAM, and a 4GB ociarchive file the timing results were:
So the chunking pretty clearly doesn't have an impact since the unchunked (original) code actually took longer in the case where it should have been optimal, indicating the difference is within the margin of testing error.
/closes #3597