Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP/S buildfetch streams in chunks #3598

Merged
merged 1 commit into from
Sep 7, 2023

Conversation

mtalexan
Copy link
Contributor

@mtalexan mtalexan commented Sep 5, 2023

Keep RAM usage down when using buildfetch. Existing code was downloading the entire file into RAM before writing it to disk, but now that ociarchive files are included that are usually 4-8 GiB that's no longer a good idea. Lower-RAM systems choke on having too little RAM available to continue operating normally.

This instead sets a somewhat arbitrary 30 MiB chunk limit. If the server itself is already chunk encoding it will use whatever that says instead, but in a lot (most?) cases the server probably isn't.


It's confirmed to fix a problem with downloading 4 GiB ociarchive files as part of a buildfetch on a desktop system with only 8 GB of RAM. The system was crashing every time during the download due to the OOM killing it for eating too much of the available RAM. We ideally want the system to be able to do builds with ociarchives of up to 8 GiB.


Effect on download performance for existing uses should be relatively negligible. It might slightly impact download time if you have a really huge network pipe, lots of RAM, slow disk IO, and small disk IO cache, but otherwise it won't affect much.

In one timing test on a system with a 1 Gbps network pipe, an NVMe disk, 64 GB of RAM, and a 4GB ociarchive file the timing results were:

chunked:
    0.58s user 0.47s system 0% cpu 4:05.30 total
unchunked/original:
    0.57s user 0.41s system 0% cpu 4:24.63 total

So the chunking pretty clearly doesn't have an impact since the unchunked (original) code actually took longer in the case where it should have been optimal, indicating the difference is within the margin of testing error.

/closes #3597

@openshift-ci
Copy link

openshift-ci bot commented Sep 5, 2023

Hi @mtalexan. Thanks for your PR.

I'm waiting for a coreos member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me. @jlebon @ravanelli - WDYT?

src/cmd-buildfetch Outdated Show resolved Hide resolved
src/cmd-buildfetch Outdated Show resolved Hide resolved

# If the HTTP headers have encoded the file transfer as chunks already, respect those instead
# of our hardcoded max size.
if 'chunked' in r.headers.get('transfer-encoding', list()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something you copied from elsewhere? Is it best practice?

Does any of this really matter for us versus just calling iter_content(chunk_size=None)?

Copy link
Contributor Author

@mtalexan mtalexan Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code was inspired by this StackOverflow answer and its comments: https://stackoverflow.com/a/16696317/388082
Reading some of the requests documentation though, the code block on StackOverflow is all but an example straight from the documentation.


According to the documentation, if you set stream=True in the get request, it allows you to use iter_content(). The parameter chunk_size is technically optional, but setting it to None means to use whatever the server said for the chunk sizes. If the server didn't decide to do chunk encoding, it will give you the same result as not using iter_content() at all (the very problem I'm trying to solve).

So the logic here is setting a variable to a sane default chunk size, then if the encoding of the reply from the server indicates it is using chunked encoding, we reset to None so the server-side chunk sizes are used instead.


"Best practice" would seem to be to use urllib instead of the lower level requests module. While I can't personally confirm one way or another, the urllib.urlretrieve() function is designed to do exactly what is being done here in an optimized way. I don't know if it has an equivalent of the exists_impl functionality though, which might be why requests was used here instead.

src/cmd-buildfetch Show resolved Hide resolved
Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a lot of sense to me. We've actually had OOM crashes from cosa buildfetch too in the past in the pipeline (though there we use the S3 APIs, which when I last dug into it I think did do chunking but I'll have to recheck). Though we do also use the HTTPFetcher in some places which might be helped by this. Thanks for the contribution!

Minor nit: can you squash the flake8 fixup into the main commit?

…ks, or declared chunk encoding to reduce RAM usage
@mtalexan
Copy link
Contributor Author

mtalexan commented Sep 6, 2023

Minor nit: can you squash the flake8 fixup into the main commit?

Done.

@jlebon jlebon enabled auto-merge (rebase) September 6, 2023 20:26
@cgwalters
Copy link
Member

/ok-to-test

Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci
Copy link

openshift-ci bot commented Sep 7, 2023

@mtalexan: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/rhcos 2d7e8c5 link true /test rhcos

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@dustymabe
Copy link
Member

I'm pretty sure rhcos CI doesn't cover this case. Merging...

@dustymabe dustymabe merged commit 52774c1 into coreos:main Sep 7, 2023
2 checks passed
@cgwalters
Copy link
Member

A minor procedural note that I didn't notice before merge; you have text content in the PR but not in the commit message. In most of our repositories we try hard to follow modern Git conventions.

Instead of:

Stream GET request for HttpFetcher download, and write in 30 MiB chunks, or declared chunk encoding to reduce RAM usage

I would have written e.g.:

buildfetch: Stream downloads instead of buffering entirely in memory

This means that cosa buildfetch which can download a large .ociarchive
can be more reliable on low memory systems.

Notice the "topic prefix". I also dropped the "in 30 MiB chunks" from the title as it's an unimportant implementation detail.

And I repeat the term buildfetch in the body too because it's really the focus.

@mtalexan mtalexan deleted the buildfetch-streamed-http branch September 14, 2023 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support lower-memory systems with buildfetch
4 participants