-
Notifications
You must be signed in to change notification settings - Fork 18
Should images be stored in Git LFS? #184
Comments
So IIRC, @nathanchance created the images. Looking in driver.sh, seems like there's a mix of cpio and ext4 image use. If the cpio's are significantly smaller, can we drop the ext4 images? Do we need to append To your point about binaries in git; these images are essentially never changing, or changing infrequently enough that they'd take up significant space, so it's not problematic yet, IMO. But I don't feel strongly about it; mostly I don't know anything about |
I don't know for sure about platform support, but pretty sure all should be able to use cpio images as initrd, so ext4 images can then be dropped. In short, when using LFS selected files are not stored in repository, but instead are uploaded to a file server and a link to the file is stored in repository instead. Better explanation can be found in official docs. |
@nathanchance recently published updated images: 02fa71e The repo is getting bigger with every update. |
I suppose I am fine with dropping the ext4 images (although I am fairly certain we have caught bugs that way because we deal with an actual file system rather than just the ramdisk code). The cpio image sizes are rather benign, do we actually need to use LFS? I'm just not exactly a huge fan of adding another binary to my workflow if I don't need to but if you all feel it's really beneficial, then sure.
|
Given that these images are not updated frequently and are pretty slim, it should be fine to proceed without LFS. How about gzipping them? |
Sure, why don't you see what makes them smallest; gzip, bzip, lz4 etc? |
Here is compression test of current rootfs.cpio for every arch:
LZMA is showing the best compression. |
The tradeoff in more efficient compression may be extended time to decompress, but if the tests aren't timing out, than I'm ok with crushing the images to be as small as possible. |
I should write a |
Okay, I got around to running some benchmarks :) ResultsClick for results
arm (compression)
arm (decompression)
arm64 (compression)
arm64 (decompression)
mips (compression)
mips (decompression)
mipsel (compression)
mipsel (decompression)
ppc32 (compression)
ppc32 (decompression)
ppc64 (compression)
ppc64 (decompression)
ppc64le (compression)
ppc64le (decompression)
x86_64 (compression)
x86_64 (decompression)
TL;DR: I believe we should use Now... the reason I did this now is that I am writing a set of regression/unit tests (basically what this repo does on a bit more of a grand scale) that will live in the Script
#!/usr/bin/env bash
BASE=$(cd "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")" && pwd)
CPIO_IMG=rootfs.cpio
NINE_K_IMG=( -9 -k "${CPIO_IMG}" )
BZ2_COMP=( bzip2 "${NINE_K_IMG[@]}" )
GZ_COMP=( gzip "${NINE_K_IMG[@]}" )
LZ4_COMP=( lz4 -9 "${CPIO_IMG}" "${CPIO_IMG}".lz4 )
LZMA_COMP=( lzma "${NINE_K_IMG[@]}" )
LZOP_COMP=( lzop -9 "${CPIO_IMG}" )
XZ_COMP=( xz "${NINE_K_IMG[@]}" )
ZSTD_COMP=( zstd -19 "${CPIO_IMG}" )
# Create results folder
RESULTS=${BASE}/results
mkdir -p "${RESULTS}"
# Benchmark compression
for FOLDER in "${BASE}"/images/*; do
COMP_RESULTS=${RESULTS}/${FOLDER##*/}-comp-results.md
cd "${FOLDER}" && \
hyperfine --export-markdown ${COMP_RESULTS} \
--prepare "rm -vrf ${CPIO_IMG}.*" \
--warmup 1 \
--runs 25 \
"${BZ2_COMP[*]}" \
"${GZ_COMP[*]}" \
"${LZ4_COMP[*]}" \
"${LZMA_COMP[*]}" \
"${LZOP_COMP[*]}" \
"${XZ_COMP[*]}" \
"${ZSTD_COMP[*]}" && \
rm -vrf "${CPIO_IMG}".zst && \
"${BZ2_COMP[@]}" && \
"${GZ_COMP[@]}" && \
"${LZ4_COMP[@]}" && \
"${LZMA_COMP[@]}" && \
"${LZOP_COMP[@]}" && \
"${XZ_COMP[@]}" && \
"${ZSTD_COMP[@]}" && \
echo >> "${COMP_RESULTS}"
LINE=3
for COMP_FILE in "${CPIO_IMG}".*; do
PER_DIFF=$(echo "scale=2;$(stat --format=%s "${COMP_FILE}")/$(stat --format=%s "${COMP_FILE%.*}")*100" | bc | sed 's/\..*$/\%/') && \
sed -i "${LINE}s/$/ ${PER_DIFF} |/" "${COMP_RESULTS}" && \
LINE=$(( LINE + 1 ))
done
hyperfine --export-markdown ${RESULTS}/${FOLDER##*/}-decomp-results.md \
--prepare "rm -vrf ${CPIO_IMG}" \
--warmup 1 \
--runs 25 \
"bzip2 -d -k ${CPIO_IMG}.bz2" \
"gzip -d -k ${CPIO_IMG}.gz" \
"lz4 -d ${CPIO_IMG}.lz4" \
"lzma -d -k ${CPIO_IMG}.lzma" \
"lzop -d ${CPIO_IMG}.lzo" \
"xz -d -k ${CPIO_IMG}.xz" \
"zstd -d ${CPIO_IMG}.zst"
done
# Format files
for COMP_RESULT in "${RESULTS}"/*-comp-results*; do
sed -i -e 's/Relative |/Relative speed | Relative size |/' -e '2s/$/---:|/' -e 's/9 .*r.*`/9`/' -e 's/9 .* -k.*`/9`/' "${COMP_RESULT}"
done
for DECOMP_RESULT in "${RESULTS}"/*-decomp-results*; do
sed -i 's/ -d.*` / -d` /' "${DECOMP_RESULT}"
done |
I don't think we really should care about the compression speed as it's done so rarely the expenses can be neglected. Ideally we can move |
My benchmarks show that zstd has the second best decompression ratio/speed, whereas the best decompression ratio algorithm is not the same as the best decompression speed algorithm. Compression speed is not awful either. Check that the tool is installed and bail out if it is not. Link: ClangBuiltLinux/continuous-integration#184 (comment) Signed-off-by: Nathan Chancellor <[email protected]>
My benchmarks show that zstd has the second best decompression ratio/speed, whereas the best decompression ratio algorithm is not the same as the best decompression speed algorithm. Compression speed is not awful either. Check that the tool is installed and bail out if it is not. Link: ClangBuiltLinux/continuous-integration#184 (comment) Signed-off-by: Nathan Chancellor <[email protected]>
No preference on choice of compression. As long as we don't have to modify defconfigs to decompress, I'm happy. Just pick one; not multiple, if possible.
SGTM. Two repos, or one repo? I'm fine with either, but it's unclear to me as stated.
I'm indifferent on this point.
That's a fair point. Orthogonally, I've been finding it frustrating to:
I think it would be helpful for us to provide:
I don't know what's a good way to organize these, since our CI probably doesn't care about 2-5, but am generally a fan of git submodules (as long as the readme clearly states the correct way to recursively clone, because I always forget to do that). Some can even likely just be fetched lazily (ie. debian images). In particular, I currently find it difficult to do more userspace testing of clang built kernels, and I think the above would go a long way towards helping. And that doesn't even touch upon packaging kernel modules from a clang built kernel build. |
I am planning one repo: https://github.com/ClangBuiltLinux/boot-utils nathanchance/boot-utils@1740f54 I will PR when ready for review.
I cannot preserve I do not think preserving
Agreed. The balance of decompression speed with regards to ratio is more important.
Agreed. Maybe we move the QEMU commands into a separate script within the rootfs repo above (and rename it Something like:
and maybe an argument like EDIT: Renamed the
Our images can do both of these. The second one is just done by passing
I do not know that we have the bandwidth for this.
See above.
Submodules do not play well with Travis unfortunately but cloning the repos and managing them through
I'll whip something up. |
My benchmarks show that zstd has the second best decompression ratio/speed, whereas the best decompression ratio algorithm is not the same as the best decompression speed algorithm. Compression speed is not awful either. Check that the tool is installed and bail out if it is not. Link: ClangBuiltLinux/continuous-integration#184 (comment) Signed-off-by: Nathan Chancellor <[email protected]>
My benchmarks show that zstd has the second best decompression ratio/speed, whereas the best decompression ratio algorithm is not the same as the best decompression speed algorithm. Compression speed is not awful either but that is not as much of a concern because compression only happens during image build time, where it is a very small fraction of the overall build process. Check that the zstd tool is installed and bail out if it is not. Link: ClangBuiltLinux/continuous-integration#184 (comment) Signed-off-by: Nathan Chancellor <[email protected]>
When was zstd decompression support added to the kernel? |
It hasn’t been yet: https://lore.kernel.org/lkml/[email protected]/ |
So I guess |
We can still use That is what I am doing here: https://github.com/nathanchance/boot-utils/blob/updates/boot-qemu.sh#L82-L83 Please review the PR if you have any other concerns, I'll leave it open until Monday evening then merge it: ClangBuiltLinux/boot-utils#1 |
Images are binary files, and they make up a significant part of the repository size. Each rootfs is 20Mb and each cpio is ~3Mb. Getting more architectures to build means also having more corresponding images. Given that we have 1Gb free LFS storage, maybe we can make use of it.
The text was updated successfully, but these errors were encountered: