-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mke2fs: the -d option can now handle tarball input #118
Conversation
Use dlopen instead of linking against libarchive to keep runtime dependencies minimal as requested in #107 (comment) |
Sorry for the delay in looking at this pull request. Things have been pretty busy. So some comments about this change. First of all, when I tried doing a trial merge, I got test failures for the m_rootgnutar test. There are a lot of failures of the form:
Secondly, please test to make sure that (a) e2fsprogs builds with and without libarchive installed. One of the really nasty things about configure scripts is that very often, people are careless about them, and the autoconf scripts slow down the build process (since you have to run configure script, which takes time), but it doesn't buy you anything in terms of portability if the build crashes and burns if the build environment does not exactly match the developer's. If it's going to be non-portable, why waste time with the configure script? Secondly, after fixing the configure script, we then find that "make check" fails all of the three new tests, since m_rootgnutar, m_rootpaxtar, and m_roottar are written assuming that the libarchive functionality is present. While I do appreciate adding the dlopen() support, one question that comes to mind is how stable is the libarchive ABI. Given that the the SOVERSION is up to 13, it causes me to wonder whether it's like openssl, where the ABI is so badly designed that a SOBUMP is needed at essentially every single release. Also, the man page needs to warn that a particular bit of functionality might not be present --- either because at compile-time, the archive.h header file isn't there, in which case the functionality will never be present, or at run-time, because the shared library for libarchive isn't -present, in which case presumably we should get a nice warning message a file was passed to the -d option, but we can't handle it because libarchive isn't available --- and not a confusing error message such as:
|
8980abb
to
701d783
Compare
No problem, same over here. :)
Thank you for your review!
This should be fixed now.
I verified that e2fsprog builds with and without libarchive installed.
Fixed now as well.
The last SOVERSION bump happened 10 years ago. I asked about the ABI stability here but got no reply yet: libarchive/libarchive#1854
I added this information to the man page.
I was unable to trigger this error. Do you remember what you did to get it? In any case, I changed some of the code paths surrounding |
I'm submitted my patch to the linux-ext4 list to get more feedback: https://lore.kernel.org/linux-ext4/[email protected]/ |
In general I think dlopen()ing a specific shared library via its full SONAME from an external project where there is no tight coordination going on is the wrong approach because it's crossing an ABI boundary with no checks in place, which requires manual patching when the SONAME gets bumped even if the used symbols have not seen any ABI or ABI change, and the code will not automatically pick up ABI changes that are API compatible. I think a better solution is to create a shared object "module" or "plugin" exposing the functions that you need that links against the libarchive library as usual, which will mean the external project can always be used safely, or the compiled will just barf, and this module within the project boundaries can be safely dlopen()ed if present. Then when packaging this module can be split into its own package and installed if desired f.ex, which will also make it possible to automatically pick up the required dependencies w/o having to hardcode those as well. |
17fb226
to
e1a8915
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The libarchive ABI has been fairly stable for nearly a decade now and I would be in favour of seeing e2fsprogs make use of it.
As mentioned inline - I would suggest sticking to dlopen, in part because forcing extra deps isn't fun. And in part because the cmake libarchive build produces a different soname - see this bug.
The cmake build does not produce a pkg-config file, which in part why I would suggest using the PKG_CHECK_MODULES macro.
HTH o/
ARCHIVE_LIB=$DLOPEN_LIB | ||
fi | ||
AC_SUBST(ARCHIVE_LIB) | ||
dnl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would swap the above check for a simple PKG_CHECK_MODULES(ARCHIVE, [libarchive >= the-version-whose-api-is-used
.
Since this is an optional feature, I would only have the dlopen path. Using linking and forcing bunch of runtime dependencies onto people who won't use the feature seems moot IMHO.
/* 64KiB is the minimum blksize to best minimize system call overhead. */ | ||
//#define COPY_FILE_BUFLEN 65536 | ||
//#define COPY_FILE_BUFLEN 1048576 | ||
#define COPY_FILE_BUFLEN 16777216 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our of curiosity: where does the 16MiB come from? It's few orders of magnitude larger than the minimum 64KiB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From Andreas Dilger, see https://lore.kernel.org/linux-ext4/[email protected]/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding an inline comment/reference IMHO.
FWIW, I really would love to see this feature merged. |
Hello and happy new year! 🙂 I just rebased this branch on top of master. If there is anything I can do to move this forward, @tytso, please do tell. Thank you! |
The github workflow seems to fail for macos and android but the failures do not seem to be due to the changes of this merge request. In fact on android, the relevant tests are successfully skipped:
|
hey, any update on this? being able to create reproducible ext3/4 filesystems would be really great. |
Apologies for the delay in reviewing it. Things have just been really crazy. I am in the middle of reviewing the patch right now. One thing which I've noticed is that the test m_rootgnutar assumes that "tar" is the GNU tar. This might not be true on non-Linux platforms for which e2fprogs is built (e.g., *BSD, MacOS, etc.). So it might be that we need a mkgnutar.pl ala the m_roottar and m_rootpaxtar tests. |
Thanks a lot for reviewing my code! 🥳 I'm currently in the middle of switching jobs, so I cannot promise you until when I can cook up a EDIT Nevermind. Replicating what GNU tar does and creating bit-by-bit identical tarballs with a Perl script was easy enough. @tytso do you want me to throw out GNU tar completely or do you think that there is value to have two tests: one which is only run if GNU tar is available and one which does the same thing but with a Perl fake tar. |
I now replaced all calls to |
If archive.h is available during compilation, enable mke2fs to read a tarball as input. Since libarchive.so.13 is opened with dlopen, libarchive is not a hard library dependency of the resulting binary. In comparison with feeding a directory tree to mke2fs via -d this has the following advantages: - no superuser privileges, nor fakeroot, nor unshared user namespaces are needed to create filesystems with arbitrary ownership information and special files like device nodes which otherwise require being root - by reading a tarball from standard input, no temporary files need to be written out first as mke2fs can be used as part of a shell pipeline which reduces disk usage and makes the conversion independent of the underlying file system A round-trip from tarball to ext4 to tarball yields bit-by-bit identical results Signed-off-by: Johannes Schauer Marin Rodrigues <[email protected]>
OK, things looks good. I've done a trial merge with the next branch, and the CI passes clean: https://github.com/tytso/e2fsprogs/actions/runs/8743318976 Do you have any other changes you want to make? If not, I can just take the merge at https://github.com/tytso/e2fsprogs/tree/josch-libarchive |
If there are more things that I want to change/fix, you will probably receive pull requests after the first e2fsprogs version with this feature lands in Debian unstable and I start using this functionality in all the places I plan to us it (like If there are any bugs related to this functionality, please do not hesitate to ping me about it either via github or via [email protected]. Thank you!! ❤️ |
I've merged this on the next branch, and have noted a potential portability problem with the m_root*tar test scripts when I tried doing a test build on FreeBSD 14. Namely, the -c option to stat is something which is only in the GNU coreutils implementation. It is not in the version of stat found in the *BSD userspace, which is also going to be the version found in MacOS/Darwin (because it is based on FreeBSD's userspace). Fortunately, MacOS apparently doesn't ship with libarchive, so it's not breaking the CI test on github. Since the test scripts are using perl, probably what needs to happen is implement "stat -c %Y" in perl. stat: illegal option -- c If you have time to fix this, that would be great. If not, I'm sure that once I do an e2fsprogs release, the FreeBSD ports maintainer for e2fsprogs will send a patcch fairly quickly once it is released. :-) |
I can also send you a full patch but only later as it is currently Sunday morning over here. Thank you for notifying me about this and thank you for merging! :) EDIT: sent you a patch by mail |
This is an independent implementation that does the same thing as #107. Unfortunately I wasn't aware that e2fsprogs had a github repository (http://e2fsprogs.sourceforge.net/ only mentions the kernel.org repo). Differences of this approach compared to #107 are:
#ifdefs
-d
option instead of adding a new option-
Just as @russdill I'm looking for comments on this. Thanks!