Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for dwarfs-like -o offset=auto? #138

Open
xplshn opened this issue Sep 16, 2024 · 5 comments
Open

Support for dwarfs-like -o offset=auto? #138

xplshn opened this issue Sep 16, 2024 · 5 comments

Comments

@xplshn
Copy link

xplshn commented Sep 16, 2024

Hi! Thanks for this awesome project. I've been using it for my AppBundle format, at https://github.com/xplshn/pelf, currently, I'm facing the issue that mounting sqfs.AppBundles is 300ms slower than mounting appimages, due to the fact that calculating the offset in shell takes too much time, I was wondering if a built-in option to do this could be considered. Or perhaps, using a custom header that the user may indicate, such as:

I may append:

__SQFS_MARKER__
1238192381931938123Asdjia^^!#
123123 .... More SQFS data here...
13213838193123123

This way, I could provide -o offset="SQFS_MARKER" and it'd find the SQFS image just fine.

@vasi
Copy link
Owner

vasi commented Sep 17, 2024

Oh this is fascinating. So my understanding is that dwarfs or pelf work by searching for a magic string? That's clever but a bit scary for two reasons:

  1. If someone tries this on a really large squashfs filesystem that doesn't actually contain the magic string, it'll appear to hang while searching.
  2. Someone could make a sketchy filesystem that contains the magic string + malicious-other-filesystem. Now it looks safe, but if mounted with "offset=auto" it's actually dangerous! I'm not sure I take this problem too seriously, there are probably easier ways to abuse squashfuse.

Would it make sense instead for pelf to calculate the offset at creation-time? Eg:

sqfs_offset_magic = __SQFS_OFFSET_MAGIC__

LOADERSCRIPT=<<EOS

SQFS_OFFSET = $sqs_offset_magic

# lots of script stuff goes here

squashfuse -o offset=$SQFS_OFFSET ...

# more script stuff
EOS

echo "$LOADER_SCRIPT" > outfile
cat static_tools.tgz >> outfile

# Size of outfile, padded to be the same size as __SQFS_OFFSET_MAGIC__
real_sqfs_offset=$(printf "%-*s\n" ${#sqfs_offset_magic} $(stat -c %s outfile))
sed --in-place -e "s/$sqfs_offset_magic/$real_sqfs_offset/" outfile

cat archive.sqfs >> outfile

This way, you not only avoid calculating the offset in bash at startup time, you avoid any time spent searching at runtime. Running is likely much more common than creating, so overall this should be a big benefit.

@xplshn
Copy link
Author

xplshn commented Sep 17, 2024

Oh this is fascinating. So my understanding is that dwarfs or pelf work by searching for a magic string? That's clever but a bit scary for two reasons:

  1. If someone tries this on a really large squashfs filesystem that doesn't actually contain the magic string, it'll appear to hang while searching.
  2. Someone could make a sketchy filesystem that contains the magic string + malicious-other-filesystem. Now it looks safe, but if mounted with "offset=auto" it's actually dangerous! I'm not sure I take this problem too seriously, there are probably easier ways to abuse squashfuse.

Would it make sense instead for pelf to calculate the offset at creation-time? Eg:

sqfs_offset_magic = __SQFS_OFFSET_MAGIC__

LOADERSCRIPT=<<EOS

SQFS_OFFSET = $sqs_offset_magic

# lots of script stuff goes here

squashfuse -o offset=$SQFS_OFFSET ...

# more script stuff
EOS

echo "$LOADER_SCRIPT" > outfile
cat static_tools.tgz >> outfile

# Size of outfile, padded to be the same size as __SQFS_OFFSET_MAGIC__
real_sqfs_offset=$(printf "%-*s\n" ${#sqfs_offset_magic} $(stat -c %s outfile))
sed --in-place -e "s/$sqfs_offset_magic/$real_sqfs_offset/" outfile

cat archive.sqfs >> outfile

This way, you not only avoid calculating the offset in bash at startup time, you avoid any time spent searching at runtime. Running is likely much more common than creating, so overall this should be a big benefit.

It kind of makes sense to replace the sqfs offset calculation using sed after the file's been created, so, thanks for the idea. But I don't think it'd be unsafe to have support for this feature built-in, dwarfs has it and it does not hang, as least on this laptop, when I run it on a damaged .AppBundle.

image
I replaced the ARCHIVE_MARKER part and the "DWARFS" header that any dwarfs archive starts with, and it did not hang, it just couldn't continue

@xplshn
Copy link
Author

xplshn commented Sep 17, 2024

I considered writting a patch to add this feature myself, but then that means that this won't be available in other systems, nor through package managers. I hope you consider it and maybe add a note/warning that says that it may be unsafe.

@kevin-vigor
Copy link
Collaborator

I agree with @vasi above that this would be best done by precomputing the offset at build time. That's what XAR does.

but in the short term, you could probably get a significant performance boost on large images by simply stopping the search for your archive marker when you hit one, instead of searching the whole archive, and asking grep to give you byte offset:

_VAR_ARCHIVE_MARKER=$(grep -a -n "^ARCHIVE_MARKER" "$0" | cut -d: -f1)

->

_VAR_ARCHIVE_MARKER=$(grep -a -b -m1 "^ARCHIVE_MARKER" "$0" | cut -d: -f1)

@xplshn
Copy link
Author

xplshn commented Sep 17, 2024

I agree with @vasi above that this would be best done by precomputing the offset at build time. That's what XAR does.

but in the short term, you could probably get a significant performance boost on large images by simply stopping the search for your archive marker when you hit one, instead of searching the whole archive, and asking grep to give you byte offset:

_VAR_ARCHIVE_MARKER=$(grep -a -n "^ARCHIVE_MARKER" "$0" | cut -d: -f1)

->

_VAR_ARCHIVE_MARKER=$(grep -a -b -m1 "^ARCHIVE_MARKER" "$0" | cut -d: -f1)

I'm limited to POSIX options ... And calculating at runtime is too slow, and doing it during the build/generation step would be best, however, this is shell and I'm trying to keep things manageable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants