diff --git a/public/blog/index.html b/public/blog/index.html index 38170ae..a0551c9 100644 --- a/public/blog/index.html +++ b/public/blog/index.html @@ -319,11 +319,11 @@
Context As you probably already know by now, a backdoor/security vulnerability has been recently discovered in xz. In this article, I won’t cover the technical aspect or the potential consequences of it (it has already been done by people way more knowledgeable than am I on that matter) but I will expose what I think we should learn from it from a packaging point of view.
[...]
- Read More + Read MoreAs you probably already know by now, a backdoor/security vulnerability has been recently discovered in xz. In this article, I won’t cover the technical aspect or the potential consequences of it (it has already been done by people way more knowledgeable than am I on that matter) but I will expose what I think we should learn from it from a packaging point of view.
-To sum up the situation briefly (as a context for the rest of this article), the xz (co-)maintainer himself introduced a backdoor in the code, obfuscated by the fact that the malicious code (or at least parts of it) was only present in the custom made tarball (basically an archive) of the sources (but not present in the git repo itself). The said tarball/archive was signed with his GPG key and manually uploaded as an asset/artifact of the releases, which packagers used a source for their package.
+To sum up the situation briefly (as a context for the rest of this article), the xz (co-)maintainer himself introduced a backdoor in the code, obfuscated by the fact that the malicious code (or at least parts of it) was only present in the custom made tarball (basically an archive) of the sources (but not present in the git repo itself). The said tarball/archive was signed with his GPG key and manually uploaded as an asset/artifact of the releases, which package maintainers used a source for their package.
Firstly, depending on how a software is developed and what it relies/depends on, the raw sources might not be usable as is (sometimes at all). It happens that some software’s sources require additional steps/pre-configuration in order to be properly compiled and packaged (for instance projects that relies on autotools or git submodules to configure sources or pull required third party dependencies). As a matter of convenience, it is quite common that upstream developers provide custom made tarballs of the sources of their software with those pre-required additional steps already done, allowing packagers to get sources usable as is.
-The other reason why packagers often rely on such custom made tarballs over “raw” sources is because it’s quite common for upstream developers to not bother integrating integrate GPG signing in their development and release workflows (basically sign their commits and tags) but only sign these custom made tarballs instead (probably as a way to “promote” it over raw sources regarding the previous point about convenience). A GPG signed source tarball represents an additional level of trust for packagers as it constitutes a reliable way to proves the identity of the person that generated/made the said tarball, but also that it hasn’t been modified in any way after being signed.
While these custom made tarballs are often more convenient for us packagers (as explained above), they are also less transparent by nature. Indeed, it’s way more involved and difficult to have a tarball being reliably reviewed and audited compared to a publicly and easily accessible git repo hosted online. And identifying differences in the content of the tarball compared to the content of the “raw” sources on its own is not enough as this is often actually expected (since these custom made tarballs are expected to include the prerequisites needed to properly compile/package the software compared to the “raw” sources).
+Firstly, depending on how a software is developed and what it relies/depends on, the raw sources might not be usable as is (sometimes at all). It happens that some software’s sources require additional steps/pre-configuration in order to be properly compiled and packaged (for instance projects that relies on autotools or git submodules to configure sources or pull required third party dependencies). As a matter of convenience, it is quite common that upstream developers provide custom made tarballs of the sources of their software with those pre-required additional steps already done, allowing package maintainers to get sources usable as is.
+The other reason why package maintainers often rely on such custom made tarballs over “raw” sources is because it’s quite common for upstream developers to not bother integrating integrate GPG signing in their development and release workflows (basically sign their commits and tags) but only sign these custom made tarballs instead (probably as a way to “promote” it over raw sources regarding the previous point about convenience). A GPG signed source tarball represents an additional level of trust for package maintainers as it constitutes a reliable way to proves the identity of the person that generated/made the said tarball, but also that it hasn’t been modified in any way after being signed.
While these custom made tarballs are often more convenient for us package maintainers (as explained above), they are also less transparent by nature. Indeed, it’s way more involved and difficult to have a tarball being reliably reviewed and audited compared to a publicly and easily accessible git repo hosted online. And identifying differences in the content of the tarball compared to the content of the “raw” sources on its own is not enough as this is often actually expected (since these custom made tarballs are expected to include the prerequisites needed to properly compile/package the software compared to the “raw” sources).
It’s undeniable that if the xz malicious code was present in the “raw” sources themselves, it would have been detected way faster and more easily (which is why it was only injected in the custom made tarball as the way to obfuscate it). My personal take on that is, when possible, we should prefer using a more transparent source (over a more convenient one) that is, by cascade, more audited and thus eventually more trustworthy.
As said previously in this article, it’s unfortunately quite common that upstream developers only sign such custom made tarballs and not commits/tags on the git sources themselves (which is one of the reason packagers choose to base their packages on them).
+As said previously in this article, it’s unfortunately quite common that upstream developers only sign such custom made tarballs and not commits/tags on the git sources themselves (which is one of the reason package maintainers choose to base their packages on them).
However, as said in the first chapter of this article, it’s important to remind that the infected xz tarball was signed with the GPG of the (co-)maintainer of the project himself (who also introduced the backdoor). That raise an important question: To which degree we want to trust a GPG signature over anything else (in this particular case, over a more transparent source)?
While the signature on its own indeed proved the identity of the person that created the tarball (and that it has not been altered since it got signed), it does not guarantee anything about the content of the archive in the first place (so not that nothing malicious or not trustworthy has been injected in it when created, as the xz situation demonstrated)! When the people themselves are compromised, the signatures worth nothing.
-My personal opinion is that, while a GPG signature is an important and valuable addition to the “trust path” upstream developers are building towards users/packagers, a transparent source (that do not require any additional effort to be proven reliable/trustworthy when compared to “raw” sources) outweighs it.
+My personal opinion is that, while a GPG signature is an important and valuable addition to the “trust path” upstream developers are building towards users/package maintainers, a transparent source (that do not require any additional effort to be proven reliable/trustworthy when compared to “raw” sources) outweighs it.
I already started the effort of switching the source of the packages I maintain on Arch Linux side that are currently based on such custom made tarballs to either the auto generated tarball made by the git platform itself the code is hosted on (which is a snapshot of the exact state of the repo itself at the precise moment the tag was created) or to the git sources themselves directly (by cloning the repo against the specific tag I want to package).
With the recent xz situation, I do think that basing our packages on a more transparent source, at the potential cost of more complex packages to maintain (as the eventual pre-required steps needed for the source to be usable in the first place would now be on our side), is worth the price.
Together with David Runge (an Arch Linux Developer), we also decided to jointly write two RFCs (namely about “how to deal with sources in our packages?” and “how to deal with signed sources and trust path?”) to try to establish general guidelines on those matters on Arch Linux side (I’ll update this part with the links to those RFCs once they have been written and published).
While packagers can make changes on their sides to bring more transparency into the packages and mitigate eventual issues, it’s in my opinion very important that all of this turns into a common effort in which upstream developers adapt their development and release workflow to provide transparent sources/artifacts without skimping on the “trust path” they provide with it.
+While package maintainers can make changes on their sides to bring more transparency into the packages and mitigate eventual issues, it’s in my opinion very important that all of this turns into a common effort in which upstream developers adapt their development and release workflow to provide transparent sources/artifacts without skimping on the “trust path” they provide with it.
Ideally, such questions as “should I prefer a signed source or a transparent one?” shouldn’t be a thing anymore. Realistically, only auto-generated source tarballs + GPG signatures or signed tags (+ eventual addition trust path) are viable.
So, for eventual upstream developers passing by, if you don’t already, please (🥺) consider starting to sign your tags/auto-generated source tarballs and provide information about additional steps that may be required to make your sources usable (if needed and if possible, obviously).
(I’ll start doing it myself for the few projects I’m developing). 😀