-
Notifications
You must be signed in to change notification settings - Fork 880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release CI: Run on all pushes and PRs, only publish on tag #1479
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1479 +/- ##
=======================================
Coverage 79.12% 79.12%
=======================================
Files 15 15
Lines 915 915
Branches 194 194
=======================================
Hits 724 724
Misses 168 168
Partials 23 23 ☔ View full report in Codecov by Sentry. |
I don't have any issue with this, if all the people who can modify the API tokens have their GitHub accounts 2FA enabled. Whether Mesa-Geo uses this approach would depend on whether Mesa uses it. |
I'm fine with this PR.
Once again, best practice is to release using 2FA. I would classify this automation as a tradeoff for convenience instead. |
I'm not sure about this one. This is way too often. Black, for instance, only does it during a release: https://github.com/psf/black/blob/main/.github/workflows/pypi_upload.yml. |
Thanks you both for reviewing.
I’m not sure what you want to achieve by this. Please read this again:
This is fundamental about the way GitHub secrets work. Once Jackie (or anyone else) uploads a secret, no one can access it any more, even not Jackie. You can delete it, overwrite it but a new vale, but not any current of future maintainer can see the uploaded secret for any moment. So there is 0 additional chance of the API key leaking. I would even argue the current practice that Jackie has to look up the key for every release has a way higher risk of API key leakage.
A best practice is not always what’s most secure. It the trade-off industry experts find between a many of their shared objectives, in this case a mix of security, convi nice, maintainability and scalability. Because most secure is to don’t have a Mesa project at all.
It’s a very short job, taking less than a minute, ensuring wheel building keeps working, even as dependencies get updated and such. It’s nice to know wheels will build fine during development, and not encounter it during release. |
I have confirmed this. I can no longer access a secret's content after I have specified it.
Debatable. Then why do PyPI devs bother to develop feature for OIDC token as per this doc?.
Are you implying people who release their packages with 2FA to PyPI are not following best practice?
I find this to be extraneous. |
You are conflating 2 things. Creating an API key for automated uploads is perfectly secure, unless GitHub is hacked. 2FA is equally important to protect user accounts. So no one can create new API keys and simply use those. But this is unrelated to automatic uploads. |
If it is perfectly secure, why does PyPI require that critical projects must have 2FA enabled to publish/update/modify packages (see https://twitter.com/pypi/status/1545455297388584960; they are even handing out hardware security keys)? |
Again, the maintainer Accounts must have 2FA enabled, possibly with with hardware keys. Unrelated to publishing with API keys |
What I meant is publishing with an API key vs publishing after one has gone through the hoops of 2FA every single time. |
As I'm not a cybersecurity person. Let me me see if I an recap the two choices concisely and hopefully accurately. Option 1: Keep the current approach (although we still need to get rid of setup.py) This requires @jackiekazil to upload her token and the secret PYPI token. So every release is confirmed by 2FA in addition to the 2FA getting on github. Result: Slower releases with a higher "bus factor" quotient Option 2: Remove the token upload Any Maintainer (which is a very small number) can create a tag which then does a build and upload to PYPI. In addition, every maintainer has to log in via 2FA. So I believe the argument is 2FA is implicit in every tag creation. Result: Faster releases with a lower "bus factor" quotient Please advise if this is an accurate description. If it is and as @rht said he is ok with it, based on current Mesa development dynamics I agree to the PR but....
|
Sorry about the confusion. Let me clarify myself - maintainers should have their account 2FA enabled, so that attackers cannot delete or modify existing keys. I also think that having 2FA enabled maintainers accounts is related to uploading to PyPI. Previously only Jackie can publish. With this PR, any maintainer who can commit and tag can trigger a release (i.e., publish). Hence, maintainers should have their GitHub accounts 2FA enabled. I'm not very familiar with GitHub secrets, so please correct me if I'm wrong. |
@tpike3 Thanks for the write-up, I think it’s correct, at least on the major points. |
That's a real concern. I do have 2FA enabled, and hope the other maintainers do. The original goal of the release CI was so that maintainers can be the one to do the execution of the release to PyPI, in addition to Jackie. I'd say it is safer to restrict the trigger to only when there is a package release on GH. This is how it is done in Black. Otherwise, with tags, one could accidentally trigger a release from the Git CLI. Random observation: Requests doesn't use any GH Actions for auto-publishing to PyPI. I would be fatal if Requests is compromised. |
I am not sure if this is the case, but to me it sounds like you are still not convinced that publishing via API keys is perfectly safe? Just out of curiosity, why do you think that or where did you get the idea from that manual publishing with 2FA is safer? Keep in mind that the 2 use cases are slightly different. Anyway i think we all agree by now that publishing via a GitHub action shifts the security part from how is the upload done to who is allowed to trigger releases (and how those accounts with that right are secured). In that sense tags give us slightly more access control, because we can create protected tags that only maintainers and admins can use, whereas anyone with write access can create releases. I think currently all members of the repo are either maintainers or admins, but we could downgrade some to write access if we want to limit the number of people with the indirect ability to publish to pypi. |
I love this conversation, but I feel like it is very fractured in many comments. I am wondering if it would be easier to pull together in a doc of some sort with pluses and minuses and then discuss during our monthly? |
Isn't the Requests case already a negative example, which doesn't use GH Actions with an API key? Would you recommend Requests dev to publish using GH Actions if it is perfectly safe? Your security assertion on using the API key hinges on the fact that the key needs to be stored in a 3rd party server where you can no longer read its content afterward (GH Actions + GH secret). |
How is requests an example for anything? I couldn't find any discussions about the release process of requests so I would assume that they are just happy with the way they do releases!? Whereas we want to move away from a single person responsible for managing the releases. That's why I said there are different use cases. I can't find any resources about requests actively deciding against releasing with API keys, but if they would want to have an automated release pipeline, sure I would recommend them to use API keys, securely stored on GitHub. But if they are happy with the status quo, why should they change? It's not like any process is better then the other, they are just different. You can do both in a secure and an insecure way. |
I specifically mention Requests as an example because they are the most popular project of Python Software Foundation's GitHub organization: see https://github.com/psf. It's not Kenneth Reitz's pet project anymore. Being one of the core projects, they sure are aware of the security risks of releasing a package. There has been 4 CVEs associated with Requests to date. |
Sure, but where is the indication that they actively decide against GitHub actions, compared to just not being interested in using it? |
Where is the indication they are not? Given the stake, it is much more likely they have considered it. See, Black uses the GH Actions CI for publishing. |
Lol what? 😂 There is no indication they are not, but you were the one that used that example, not me. I was just wondering and asking for clarification on how that example is useful when there is no sign of dismissing API keys as insecure. Right now your argument is based on pure guessing. Anyway we should cut that discussion, it doesn't seem to go anywhere. |
Did you not see that the most popular PSF project is not using the GH Actions CI? They have enough eyeballs looking at the project to consider everything. Why is this so hard to accept? |
There is no problem accepting that, but we have no insights to why it was decided to not use GH Actions. I just want to point out that there are multiple reasons for such a decision. Let me ask you a simple question: If requests would be using GitHub Actions release pipeline, would you consider it safe to do so? |
Yes. You could try recommending this to them. |
Great! Now I won't recommend them using GH Actions, because, as you said, they probably already discussed it and decided against using it for unknown reasons. However, I want to point out that requests is build on top of urllib3, which is the second most downloaded python package according to pypistats Also it doesn't seem that Kenneth Reitz (the creator of requests) is generally opposed to publishing that way. He is also the creator of Pipenv and that too uses GH Actions to publish new versions. I hope this convinces you a bit more. Although honestly, all this doesn't mean much. Security should not be about copying what other people are doing. Security is hard and always consists of trade offs. It seems like you are conflating several aspects, so I would recommend you to do some more reading in that direction. Specifically, 2FA and API keys solve different problems, so it really is not a question about one or the other. @EwoutH provided some good links in the original post. |
Why hesitate? It's a FOSS project, so why do you think the reason is hidden from the community?
Fair point for urllib3. Though they have extra security measures as seen in urllib3/urllib3#2666. It's not your vanilla workflow used by Pipenv.
I'm not sure if you have read my points properly before trying to lecture? |
After reviewing, I would like to propose the following. There are many ways we can protect the main branch. From a threat perspective, I see and hear concerns: Concerns:
Solutions options:
Solution 1: Leave as is, just Jackie (@tpike3 backup)
Solution 2: Leave as is - Manual deploy, empower multiple people to deploy
Solution 3: Setup auto deploy with safety checks
^^ I think we should test drive Solution 3 with the door option that we undo if we don't like it. |
Small nitpick: It can't be triggered by merges, only by creation of a tag. I think my view on this issue is clear, and I also think our time and effort can be spend much better on other issues and features. I just want to reiterate this is the recommended way of doing things according to official documentation. There are also a lot of other packages that follow this best practice (just a few I quickly found):
Then, not using this exact action but also using API keys: NumPy, SciPy and Pandas. |
@EwoutH I updated my comment to reflect the tag and what I intended. |
I really like the approach outlined by @jackiekazil. May I summarize the security threats we are currently facing and how in my opinion best to protect against them with this approach? (ordered from highest to lowest risk, but this is purely subjective).
I hope this is an accurate summary of our threats. I think all in all the threat level is very manageble (and lets be honest - even if someone would manage to somehow distribute malware with mesa - the impact will be very small. But this should not stop us from using best practices). |
What's the plan to move this forward? Anything expected from me (on this PR)? |
From my end: needs to limit the package build to only when a release is published. We haven't reached a common ground regarding with the automatic publishing, so I'm fine with the status quo of publishing manually, because it still works. |
Does this still has a chance of getting used in some way, or shall I close this? |
I would like to try out this workflow for mesa-interactive. Maybe you want to open a similar PR there? I used hatchling for the builds so far |
Certainly, opened a PR: Corvince/mesa-interactive#4 |
The slow releases have become a bottleneck in delivering the bugfix releases, given that the Solara frontend is still buggy. So I'd say let's go for it and revive this PR. We never push directly to the main branch, so running on pushes is redundant. |
Great, I will update it. Is the API token uploaded? The problem with only running on PRs is that it’s very cumbersome to backtrack what the last commit on the main branch was that actually passed. |
Every commits has a PR number attached to it, and even if a PR is merged with multiple commits, you should be able to tell which PR it comes from, and that it has to pass the release build in order to be merged to main, as a constraint. |
Currently the release CI only run when a GitHub Release is created. This PR modifies that is runs on each PR and push, and uploads. It uses the official action from PyPI: https://github.com/pypa/gh-action-pypi-publish It also now uses build to build the wheel, instead of calling setup.py directly, which is deprecated.
Rebased and cleaned up. Ready to review/merge!
The problem is that on the Action tab you can't get an overview with versions on the main branch passing and failing. It's also a bit redundancy, it's a quick workflow anyway. |
OK at least it should be consistent with the current build_lint.yml workflow: mesa/.github/workflows/build_lint.yml Lines 4 to 7 in 3b580a3
|
Same configuration as in build_lint.yml
Agreed, done! |
Merged, thank you. |
Thanks for merging! Can someone confirm that the PyPI API token was added as a secret to GitHub? |
No secrets are currently set for the repo. I think only @jackiekazil and maybe @tpike3 have the ability to generate the token on PyPi |
I just saw that PyPi now supports trusted publishing for GitHub. We might want to update the workflow |
That means we don't need an API key on GitHub, but @jackiekazil or @tpike3 still need to configure it on PyPi. |
I need a day or two - currently working on multiple deadlines. |
Currently the release CI only run when a GitHub Release is created. This PR modifies that it runs on each PR and push (to test that wheel building works), and uploads the dist and wheel to PyPI when a tag is created (instead of only on a GitHub Release).
It uses the official action from PyPI: https://github.com/pypa/gh-action-pypi-publish
@jackiekazil you needs to upload the PyPI API token once as a secret to GitHub. After that, you don't need to be involved in a release any more. See the official documentation below:
After uploading the API token, any maintainer can just create a new tag, after which this action will upload the wheel and dist to PyPI.
It also now uses build to build the wheel, instead of calling
setup.py
directly, which is deprecated.This closes #165 and closes #1252.
@tpike3 and @rht please review. @wang-boyu, if you like we can also implement this for Mesa-Geo.
There was some discussion in #1252 about security and potential two factor authentication, but I can assure this is the best practice on deploying to PyPI. It uses a API key stored in the GitHub that after uploading no one can access (so it can't even leak to maintainers). It's the official way PyPI recommends doing it (the action is developed by PyPI after all) and