Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend GHA integration to support pull_request_target events #331

Closed
2 tasks done
maxrake opened this issue Oct 16, 2023 · 4 comments · Fixed by #341
Closed
2 tasks done

Extend GHA integration to support pull_request_target events #331

maxrake opened this issue Oct 16, 2023 · 4 comments · Fixed by #341
Assignees
Labels
enhancement New feature or request needs triage Used to indicate that an issue hasn't been reviewed

Comments

@maxrake
Copy link
Contributor

maxrake commented Oct 16, 2023

Overview

Is your feature request related to a problem? Please describe.

The current GitHub Action (GHA) integration is limited to pull_request events only. It does not support pull requests from forked repos (e.g., pull_request_target events). This was considered acceptable as the Phylum GitHub App was promoted as the primary option for GitHub users. However, the action now supports manifests in the form of lockfile generation while the app does not. That means more users are going to turn to the action and will want it to work in more environments than simply internal pull requests.

Describe the solution you'd like

Add support to the GHA for PRs, regardless of where they originate.

Describe alternatives you've considered

N/A

Additional context

The pull_request_target event appears to have the same webhook event payload as the pull_request event. This should allow for simple inclusion, but further investigation and testing is needed to uncover any differences.

Acceptance criteria

  • pull_request_target events are supported
  • Documentation is updated
@maxrake maxrake added enhancement New feature or request needs triage Used to indicate that an issue hasn't been reviewed labels Oct 16, 2023
@maxrake maxrake self-assigned this Oct 16, 2023
@marvin-hansen
Copy link

I fully support this feature because it allows to scan new pull requests from forked repositories before merge thus preventing sneaking in malware through poisoned dependencies.

This will be a very meaningful contribution to the overall security and credibility of open source projects.

Once this has been completed and tested, I recommend presenting this at appropriate conferences to promote the best practice of supply chain security analysis of all incoming pull requests.

@kylewillmon
Copy link
Contributor

  • Branch pipelines are enabled with push events
  • pull_request_target events are supported

These are two very different use cases. I fully support allowing pull_request_target events. But I am completely opposed to allowing push events. Can we split this into two separate issues to avoid cluttering the discussion?

@maxrake maxrake changed the title Extend GHA integration to support more events Extend GHA integration to supportpull_request_target events Oct 17, 2023
@maxrake maxrake changed the title Extend GHA integration to supportpull_request_target events Extend GHA integration to support pull_request_target events Oct 17, 2023
@maxrake
Copy link
Contributor Author

maxrake commented Oct 17, 2023

Can we split this into two separate issues to avoid cluttering the discussion?

Done. This issue has been updated and #332 has been created to split the two.

@marvin-hansen
Copy link

@kylewillmon @maxrake

I want to elaborate the underlying case for PR protection in more detail.

First of all, I do OSS as a side gig and my real job is building up my company. However, since OSS supply chain security bugs me already for three months, I want to take the time to write down the exact problem I struggle with and why PR protection matters so much not only to me, but the wider OSS community.

As an OSS maintainer, I have limited time, therefore I automate the crap out of it with GH actions. I think only very few small OSS projects have a full-time maintainer therefore time-to-value is really important. For every one GH action I use, there are at least 3 to 5 that were deleted because something didn’t work and when it takes too much time to fix, then it’s not worth the hassle. When there is a better working alternative, I take it immediately and never look back,

After my project got accepted into the Linux Foundation, I had to implement all the OSS security best practices, which btw, is a lot and took me a few weeks to complete. Among others, I had to deploy a lot of preventive security measures, and again, GH action automation to the rescue. That way, I already scan every incoming pull request for known security vulnerabilities, accidentally leaked API or SSH keys, and statically check for unused or outdated dependencies.

For supply chain security, there is a very real ongoing threat with even Microsoft falling victim to it. And if it that isn’t bad enough, there is already the first known OSS supply chain attack against the banking sector. Since a lot of OSS projects aren’t super well managed by volunteers (who have a full-time job, family, and a life), it’s not too hard to imagine that a bad actor submits a PR with malicious code hidden in modified (typo-squatted) dependencies. A code reviewer would almost certainly spot clear exploit code, but probably never even consider checking an innocent looking API call to a generic looking network lib dependency.

Likewise, any normal developer already has enough on his/her plate to do any given day, so he/she may accidentally grabs a typosquatted lib, but when the resulting PR to origin is submitted, this clearly need to bounce to prevent harm before in can happen.

The problem, though, the more advanced the security requirements, the less OSS friendly players are around. Sure enough, for big companies, big money buys you virtually anything, but somehow the security companies don’t like OSS too much so there is no real free alternative available.

However, there is real opportunity. Take the PyTorch project as an example, today (Oct/18/2023) they have over 800 open PR's. Full security scan needs to happen on GitHub for every incoming PR so that lemons get bounced & closed immediately. One less PR to worry about is already something if you have another 800 in the queue. I would suggest offering the PyTorch project free help to setup PR protection in exchange for a blogpost since PyTorch is by far the most widely used ML framework out there so the entire Ai industry may notice you.

In line with the famous Milkshake Marketing, I am not here for a GH action, to the contrary. My job is to protect my repo from increasingly sophisticated supply chain attacks.

My job to be done is simple:

Help me to protect my repo by scanning every incoming PR.

All I am looking for is peace of mind so I can focus on my work.

In terms of messaging, I would strongly prefer something that connects the value of your solution to my actual problem. For example:

Phylum protects your codebase against increasingly sophisticated attacks, so you feel peace of mind while focusing on your work.

Src: How to Write a Great Value Proposition

The moment I read something like this in a Readme, in a repo, a blog, an email, or somewhere else, I am sold before even having made a decision because the message tells me that finally somebody understand the problem I have and therefore, using the solution is a complete no brainer with all the other alternatives already ruled out. It is a known fact that decisions are made long before you even know it.

I wish software security companies would understand this and delivery products I can deploy to my OSS repo in five to ten minutes and when things go well, look at their commercial offering to see if there is something in for my company.

I wrote this down today to give you the clearest possible picture of how this single feature affects OSS maintainers life, and by implication the wider tech industry that relies so heavily on OSS projects in their commercial offerings.

Also, I clearly see the progress, and I want to thank everyone involved to make PR protection the new gold standard for supply chain security.

maxrake added a commit that referenced this issue Oct 24, 2023
This change makes it possible to use the GitHub Action (GHA) integration
for PRs that originate from forks, by enabling workflows to run on the
`pull_request_target` event. This change is simple enough, but comes
with some heavy security considerations. In order for the GHA to run, a
workflow using it will need to checkout the PRs code. This requires very
careful attention so as not to allow so-called "pwn requests":

* securitylab.github.com/research/github-actions-preventing-pwn-requests
* blog.gitguardian.com/github-actions-security-cheat-sheet

To that end, the documentation updates that go with this change will be
explicit in their warnings about how to properly use the Phylum GHA in
the most secure way possible. This effectively means the workflow should
be limited to executing the bare minimum required for Phylum analysis.

Further, the Phylum analysis that does run will need to do so in a way
that does not allow for arbitrary code execution. Therefore, this PR is
intended to be held back until the Phylum CLI adds some additional
protections around lockfile generation.

Closes #331
maxrake added a commit to phylum-dev/phylum-analyze-pr-action that referenced this issue Oct 25, 2023
This is the documentation update that goes with phylum-dev/phylum-ci#331

Changes made include:

* Update the action tagline and marketplace description
  * Provide a better value proposition
* Add information about supporting pull requests from repository forks
  * Pre-requisite event for workflow triggers
  * How to check out the PR repository
* Add warnings about the security implications of supporting PR forks
  * Provide links explaining the risks
  * Provide guidance on how to use the Phylum GHA in a secure manner
* Change more instances of `lock.file` to `dependency.file`
maxrake added a commit that referenced this issue Nov 30, 2023
This change makes it possible to use the GitHub Action (GHA) integration
for PRs that originate from forks, by enabling workflows to run on the
`pull_request_target` event.

This change comes with some heavy security considerations. In order for
the GHA to run, a workflow using it will need to checkout the PRs code.
This requires very careful attention so as not to allow so-called
"pwn requests":

* https://securitylab.github.com/research/github-actions-preventing-pwn-requests/
* https://blog.gitguardian.com/github-actions-security-cheat-sheet/

To that end, the documentation updates that go with this change will be
explicit in their warnings about how to properly use the Phylum GHA in
the most secure way possible. This effectively means the workflow should
be limited to executing the bare minimum required for Phylum analysis.

Further, the Phylum analysis that does run will need to do so in a way
that does not allow for arbitrary code execution. Therefore, this change
also makes use of the Phylum CLI feature to disable lockfile generation
for untrusted contexts like the `pull_request_target` event trigger in
the GitHub Actions platform that this PR is looking to support.

Unfortunately, this approach does not allow for repositories with only
manifests (e.g., libraries). That is, manifests without a corresponding
lockfile. Those will fail to parse, with a new return code unique to
that situation. The only recourse, which is given in a log message and
will be spelled out more explicitly in the corresponding documentation
updates, is to reccomend adding a lockfile instead of or along with the
manifest.

Additional changes made include:

* Add predicate for if Phylum sandbox will work in current environment
* Create a `CLIExitCode` enumeration to track Phylum CLI exit codes
* Update `ReturnCode` enumeration for tracking `phylum-ci` return codes
  * Add descriptions
  * Make a distinction between _policy_ failures and general failures
  * Add new code for when no dependency files were provided or detected
  * Add new code for when a manifest is attempted to be parsed but
    lockfile generation has been disabled
* Ensure lockfiles provided as arguments are only filtered once
* Use `phylum project status` instead of `phylum project list` for
  setting the repository URL
* Update unit tests
* Format and refactor throughout
  * Show the final return code, _after_ any modifications

Closes #331

BREAKING CHANGE: Phylum CLI installs before v5.9.0-rc2 are no longer
supported. A version with support for disabling lockfile generation and
skipping sandbox usage is required.

BREAKING CHANGE: The `phylum-ci` return code for a policy violation that
results from a Phylum analysis has been changed from 1 to 2 in order to
make it distinct from the default failure code that is generated for all
raised `SystemExit` exceptions with a message instead of a code.
maxrake added a commit that referenced this issue Dec 5, 2023
…#341)

This change makes it possible to use the GitHub Action (GHA) integration
for PRs that originate from forks, by enabling workflows to run on the
`pull_request_target` event.

This change comes with some heavy security considerations. In order for
the GHA to run, a workflow using it will need to checkout the PRs code.
This requires very careful attention so as not to allow so-called
"pwn requests":

* https://securitylab.github.com/research/github-actions-preventing-pwn-requests/
* https://blog.gitguardian.com/github-actions-security-cheat-sheet/

To that end, the documentation updates that go with this change will be
explicit in their warnings about how to properly use the Phylum GHA in
the most secure way possible. This effectively means the workflow should
be limited to executing the bare minimum required for Phylum analysis.

Further, the Phylum analysis that does run will need to do so in a way
that does not allow for arbitrary code execution. Therefore, this change
also makes use of the Phylum CLI feature to disable lockfile generation
for untrusted contexts like the `pull_request_target` event trigger in
the GitHub Actions platform that this PR is looking to support.

Unfortunately, this approach does not allow for repositories with only
manifests (e.g., libraries). That is, manifests without a corresponding
lockfile. Those will fail to parse, with a new return code unique to
that situation. The only recourse, which is given in a log message and
will be spelled out more explicitly in the corresponding documentation
updates, is to reccomend adding a lockfile instead of or along with the
manifest.

Additional changes made include:

* Add predicate for if Phylum sandbox will work in current environment
* Create a `CLIExitCode` enumeration to track Phylum CLI exit codes
* Update `ReturnCode` enumeration for tracking `phylum-ci` return codes
  * Add descriptions
  * Make a distinction between _policy_ failures and general failures
  * Add new code for when no dependency files were provided or detected
  * Add new code for when a manifest is attempted to be parsed but
    lockfile generation has been disabled
* Ensure lockfiles provided as arguments are only filtered once
* Use `phylum project status` instead of `phylum project list` for
  setting the repository URL
* Update unit tests
* Format and refactor throughout
  * Show the final return code, _after_ any modifications

Closes #331

BREAKING CHANGE: Phylum CLI installs before v5.9.0-rc2 are no longer
supported. A version with support for disabling lockfile generation and
skipping sandbox usage is required.

BREAKING CHANGE: The `phylum-ci` return code for a policy violation that
results from a Phylum analysis has been changed from 1 to 2 in order to
make it distinct from the default failure code that is generated for all
raised `SystemExit` exceptions with a message instead of a code.
maxrake added a commit to phylum-dev/phylum-analyze-pr-action that referenced this issue Dec 5, 2023
This is the documentation update that goes with phylum-dev/phylum-ci#331

Changes made include:

* Update the action tagline and marketplace description
  * Provide a better value proposition
* Add information about supporting pull requests from repository forks
  * Pre-requisite event for workflow triggers
  * How to check out the PR repository
* Add warnings about the security implications of supporting PR forks
  * Provide links explaining the risks
  * Provide guidance on how to use the Phylum GHA in a secure manner
* Add a FAQs section to the README
  * Explain why there can be both a failing status check and a success
    comment
  * Explain why analysis can fail for PRs from forked repositories
* Change more instances of `lock.file` to `dependency.file`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs triage Used to indicate that an issue hasn't been reviewed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants