Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High-level squashfuse optimization opportunities #73

Open
haampie opened this issue Jun 21, 2022 · 5 comments
Open

High-level squashfuse optimization opportunities #73

haampie opened this issue Jun 21, 2022 · 5 comments

Comments

@haampie
Copy link
Contributor

haampie commented Jun 21, 2022

I've documented some performance issues with squashfuse here: https://github.com/haampie/squashfs-mount/. I'm observing about a 1.5x increase in compile time of LLVM when mounting compilers from a squashfs file using squashfuse compared to just using (lib)mount.

Is this overhead expected?

@vasi
Copy link
Owner

vasi commented Jun 21, 2022

You would definitely expect overhead from FUSE over an in-kernel driver, yes.

@haampie
Copy link
Contributor Author

haampie commented Jun 22, 2022

Okay, then I'll stick to the kernel version for now. Thanks!

@haampie
Copy link
Contributor Author

haampie commented Sep 16, 2022

FWIW: when using squashfuse_ll instead of squashfuse I get a 10x speedup for du -sh mountpoint/.

Perf tells me squashfuse spends the vast majority of time decompressing whereas squashfuse_ll spends like 5% of the time there.

Is there any reason to keep the high-level version if the low-level version performs so much better?

@haampie haampie reopened this Sep 16, 2022
@vasi
Copy link
Owner

vasi commented Sep 17, 2022

The high-level version uses a simpler FUSE API, which has wider availability. A number of platforms (Minix, NetBSD) only support the high-level API. Unfortunately, the high-level API doesn't map one-to-one(-ish) with kernel VFS operations, but instead talks to a library that manages things like inode allocation, that make it inherently slower. Something that hits many different inodes, like du, should be particularly bad.

If squashfuse_ll works better for you, then I recommend sticking with it! But I'd like to keep squashfuse available on other platforms, and there's no harm in leaving the high-level version around.

Let me rename this ticket to something about optimizing high-level squashfuse, since that seems to be where this has landed. Please go ahead and explain how you did your testing, and share what results you got. Then we can use this as an opportunity for anybody who wants to spend time optimizing high-level squashfuse.

@vasi vasi changed the title Bad performance compared to mount High-level squashfuse optimization opportunities Sep 17, 2022
@haampie
Copy link
Contributor Author

haampie commented Sep 17, 2022

Using squashfuse the timing is consistently:

$ time du -sh /x
43G	/x

real	0m12.548s
user	0m0.040s
sys	0m0.592s

$ time du -sh /x
43G	/x

real	0m12.450s
user	0m0.024s
sys	0m0.569s

$ time du -sh /x
43G	/x

real	0m12.397s
user	0m0.059s
sys	0m0.526s

there's no caching effects.

squashfuse_ll is 13x better the first and 45x better the second and later runs:

$ squashfuse_ll file.squashfs /x

$ time du -sh /x
42G	/x

real	0m0.902s
user	0m0.040s
sys	0m0.405s

$ time du -sh /x
42G	/x

real	0m0.275s
user	0m0.018s
sys	0m0.167s

$ time du -sh /x
42G	/x

real	0m0.269s
user	0m0.005s
sys	0m0.176s

mount is best:

$ time du -sh /x
42G	/x

real	0m0.527s
user	0m0.020s
sys	0m0.497s

$ time du -sh /x
42G	/x

real	0m0.108s
user	0m0.032s
sys	0m0.075s

$ time du -sh /x
42G	/x

real	0m0.109s
user	0m0.028s
sys	0m0.080s

Perf shows squashfuse spends all its time decompressing:

# Children      Self  Command     Shared Object       Symbol                                                   
# ........  ........  ..........  ..................  .........................................................
#
    44.07%    44.06%  squashfuse  libzstd.so.1.5.2    [.] ZSTD_decompressBlock_internal.part.13
            |          
             --44.04%--ZSTD_decompressBlock_internal.part.13

    16.41%    16.41%  squashfuse  libzstd.so.1.5.2    [.] _HUF_decompress4X1_usingDTable_internal_bmi2_asm_loop
            |          
             --16.40%--_HUF_decompress4X1_usingDTable_internal_bmi2_asm_loop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants