Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use requirements.txt as lockfile #30

Merged
merged 13 commits into from
May 15, 2023
Merged

Use requirements.txt as lockfile #30

merged 13 commits into from
May 15, 2023

Conversation

wuan
Copy link
Contributor

@wuan wuan commented May 11, 2023

The basic requirement is contained in requirements.in. requirements.txt can be updated by using pip-compile from the pip-tools package.

@@ -0,0 +1 @@
magic-wormhole-mailbox-server -e git+https://github.com/magic-wormhole/magic-wormhole-mailbox-server/tarball/4b358859ba80de37c3dc0a5f67ec36909fd48234#egg=magic-wormhole-mailbox-server
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these github hashes (of magically-produced tarballs) are not stable...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That hash is the commit hash from the repo:

magic-wormhole/magic-wormhole-mailbox-server@4b35885

Pretty stable, you can also use a tag (but this is less stable).

Please note that the source URL did not change with this PR, we already used it before.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's a different thing, but there was some issue with using GitHub links (e.g. from the "Download" button, approximately). Maybe it was only for ZIPs?

Copy link
Contributor Author

@wuan wuan May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a chat with @hacklschorsch yesterday and he mentioned that the hashes from the archive have been differing in some cases. Here we use a Git hash to fix the source and we do not care about different archive hashes as long as they contain exactly the files which are from that Git hash we use.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, okay that was it -- that the bits can change (so if you have a workflow that checks those, it'll fail -- but only "somtimes" e.g. when GitHub nukes a cache or something?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use the git hash to access the archive (w/o relying on git)?
Are you suggesting there could be some differences between sources cloned using Git CLI and those "mirrored" with plain HTTP(s)?

I was under the impression that tar and zip archives were part of Git itself (e.g.: git archive)!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still seems to work as expected with https only. At least for our purpose.
Pip freeze gives me the same result in both cases:

diff <(docker run --rm -t --entrypoint pip fab306454663 freeze) <(docker run --rm -t --entrypoint pip 64a7dd3f2103 freeze)
13c13
< magic-wormhole @ https://github.com/magic-wormhole/magic-wormhole/tarball/f66e759b2494d6713238d86eb0e8001f297ef553
---
> magic-wormhole @ git+https://github.com/magic-wormhole/magic-wormhole@f66e759b2494d6713238d86eb0e8001f297ef553

Only the second image is now 83MB bigger (not much, but still 50% increase!) and present a number of extra packages that could introduce more vulnerabilities:

19a20,21                                                                                                                                                                                                           
> git-man/now 1:2.30.2-1+deb11u2 all [installed,local]                                                                                                                                                             
> git/now 1:2.30.2-1+deb11u2 amd64 [installed,local]                                                                                                                                                               
24a27                                                                                                    
> less/now 551-2 amd64 [installed,local]                                                                 
30a34,35                                                                                                                                                                                                           
> libbrotli1/now 1.0.9-2+b2 amd64 [installed,local]                                                                                                                                                                
> libbsd0/now 0.11.3-1 amd64 [installed,local]
34a40                      
> libcbor0/now 0.5.0+dfsg-2 amd64 [installed,local]
36a43                     
> libcurl3-gnutls/now 7.74.0-1.3+deb11u7 amd64 [installed,local]                                         
38a46,47                   
> libedit2/now 3.1-20191231-2+b1 amd64 [installed,local]                                                 
> liberror-perl/now 0.17029-1 all [installed,local]
41a51                     
> libfido2-1/now 1.6.0-2 amd64 [installed,local]
43a54                    
> libgdbm-compat4/now 1.19-2 amd64 [installed,local]
54a66,67                  
> libldap-2.4-2/now 2.4.57+dfsg-3+deb11u1 amd64 [installed,local]                                        
> libldap-common/now 2.4.57+dfsg-3+deb11u1 all [installed,local]                                         
56a70                                 
> libmd0/now 1.0.3-3 amd64 [installed,local]
59a74                    
> libnghttp2-14/now 1.43.0-1 amd64 [installed,local]
67a83,84                
> libperl5.32/now 5.32.1-4+deb11u2 amd64 [installed,local]                                               
> libpsl5/now 0.21.0-1.2 amd64 [installed,local]
68a86,89                  
> librtmp1/now 2.4+20151223.gitfa8646d.1-2+b2 amd64 [installed,local]                                    
> libsasl2-2/now 2.1.27+dfsg-2.1+deb11u1 amd64 [installed,local]                                         
> libsasl2-modules-db/now 2.1.27+dfsg-2.1+deb11u1 amd64 [installed,local]                                
> libsasl2-modules/now 2.1.27+dfsg-2.1+deb11u1 amd64 [installed,local]                                   
76a98
> libssh2-1/now 1.9.0-2 amd64 [installed,local]
86a109,115
> libx11-6/now 2:1.7.2-1 amd64 [installed,local]
> libx11-data/now 2:1.7.2-1 all [installed,local]
> libxau6/now 1:1.0.9-1 amd64 [installed,local]
> libxcb1/now 1.14-3 amd64 [installed,local]
> libxdmcp6/now 1:1.1.2-3 amd64 [installed,local]
> libxext6/now 2:1.3.3-1.1 amd64 [installed,local]
> libxmuu1/now 2:1.1.2-2+b3 amd64 [installed,local]
96a126
> openssh-client/now 1:8.4p1-5+deb11u1 amd64 [installed,local]
98a129
> patch/now 2.7.6-7 amd64 [installed,local]
99a131,133

Is this worth it? (I'm interested to know more about the risk you guys are talking about - also with @hacklschorsch)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that tar and zip archives were part of Git itself (e.g.: git archive)!

I believe that's correct, but the output isn't stable. So if you have a process that checks hashes, it'll fail sometimes. (Certainly there are ways besides "use the github API" to get the sources of course, I was just highlighting that that particular method has a potential problem).

I believe this discusses the problem: https://github.blog/2023-02-21-update-on-the-future-stability-of-source-code-archives-and-hashes/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that pip freeze incantation is doing anything to check the contents...? (e.g. like --require-hahes in pip might)

@meejah
Copy link
Collaborator

meejah commented May 11, 2023

It seems to me that overall pinned requirements would best come from upstream, as is common in lots of other Python command-line application (e.g. magic-folder) and ideally come with hashes as well.

@wuan
Copy link
Contributor Author

wuan commented May 11, 2023

It seems to me that overall pinned requirements would best come from upstream, as is common in lots of other Python command-line application (e.g. magic-folder) and ideally come with hashes as well.

Projects usually allow ranges of versions in order to work together with other dependencies, for example see magic-wormhole-mailbox-server/setup.py Line 33:

      install_requires=[
          "attrs >= 16.3.0", # 16.3.0 adds __attrs_post_init__
          "twisted[tls] >= 17.5.0",
          "autobahn[twisted] >= 0.14.1",
      ],

That is the reason why we lock the dependency versions here in order to have a better reproducibility.

@meejah
Copy link
Collaborator

meejah commented May 11, 2023

The recommended approach is for libraries to be open-ended, but for programs to be way more specific (ideally, locking to precise versions). The latter is what e.g. magic-folder does: https://github.com/LeastAuthority/magic-folder/blob/main/requirements/base.txt

@meejah
Copy link
Collaborator

meejah commented May 11, 2023

(I suspect the setup.py stuff is basically just "leftover" from before the 3-way split...? Then, it did make sense to be open-ended because part of the repo was 'a library')

@meejah
Copy link
Collaborator

meejah commented May 11, 2023

That is the reason why we lock the dependency versions here in order to have a better reproducibility.

Yeah, what I'm suggesting is that it could be better to put that effort upstream, since it makes sense to lock them there too. (That is, because it's "a program" and not "a library").

@wuan
Copy link
Contributor Author

wuan commented May 11, 2023

Yeah, what I'm suggesting is that it could be better to put that effort upstream, since it makes sense to lock them there too. (That is, because it's "a program" and not "a library").

Then just open an issue upstream, this topic needs to be discussed there and reference this repo.

@meejah
Copy link
Collaborator

meejah commented May 11, 2023

Sure, please open a ticket -- I don't know your requirements nor what discussion you believe needs to happen?

@wuan
Copy link
Contributor Author

wuan commented May 12, 2023

Sure, please open a ticket -- I don't know your requirements nor what discussion you believe needs to happen?

You had the idea to do this, there are no requirements outside of what we were talking about. So just go ahead.

Copy link
Contributor

@btlogy btlogy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seem to improve the status quo: now the dependencies are pinned and can be easily parsed (hopefully by dependabot and/or other scan tools).

Thank you for fixing this.

@btlogy
Copy link
Contributor

btlogy commented May 15, 2023

As for the upstream dependencies not (yet) been pinned, it could be improved (I can open an issue if that would help).
But this change makes our image more reproducible regardless.
If upstream would pin those deps later, we might be able to skip the pip-compile I guess.

@meejah
Copy link
Collaborator

meejah commented May 15, 2023

The requirements.txt could still benefit from containing hashes, allowing one to use --require-hashes upon install.

(Also, as I said above, if this effort was directed upstream in such a way that releases were done like this, more people would benefit and the actual upstream authors would hopefully take a look at the dependencies themselves)

@wuan
Copy link
Contributor Author

wuan commented May 15, 2023

Used hashes with the Git repo which led to errors like

ERROR: Can't verify hashes for these requirements because we don't have a way to hash version control repositories:

We can try again, as we are back to the tarball now.

@wuan
Copy link
Contributor Author

wuan commented May 15, 2023

Seems to work. Now we might get errors with archives having a different hash in the future, but that should be easy to fix.

@wuan wuan merged commit 64a8739 into main May 15, 2023
@wuan wuan deleted the pin_requirements branch May 15, 2023 16:07
@btlogy
Copy link
Contributor

btlogy commented May 15, 2023

Seems to work. Now we might get errors with archives having a different hash in the future, but that should be easy to fix.

Thank you.

At least, this should bring us a step closer to have an accurate view over the known vulnerabilities in our images.
I'll try to see how we can get there in:

btlogy added a commit that referenced this pull request May 26, 2023
* avoid git package - unused since #30
* re-order and document steps
* use environment variables as parameter
* expose ports

Signed-off-by: Benoit Donneaux <[email protected]>
@btlogy btlogy mentioned this pull request May 26, 2023
btlogy added a commit that referenced this pull request May 26, 2023
* avoid git package - unused since #30
* re-order and document steps
* use environment variables as parameter
* expose ports

Signed-off-by: Benoit Donneaux <[email protected]>
btlogy added a commit that referenced this pull request May 26, 2023
* avoid git package - unused since #30
* re-order and document steps
* use environment variables as parameter
* expose ports

Signed-off-by: Benoit Donneaux <[email protected]>
btlogy added a commit that referenced this pull request Jun 1, 2023
* Improve Dockerfiles

* avoid git package - unused since #30
* re-order and document steps
* use environment variables as parameter
* expose ports

* Remove useless apt step

---------

Signed-off-by: Benoit Donneaux <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants