-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify the code by using GitHub CLI and the GitHub REST API #6
Conversation
The output in the earlier version was designed to be useful for a human to read and act on. I don't mind having a succinct pipeline-friendly mode as an option, like many CLI utilities do (or check
Each The rationale is that in a given repository, you would normally only need to take one action per affected package, rather than for each alert if there are multiple vulnerabilities in the same package or multiple alerts for the same issue that affect different lockfiles.
This is useful, although it would be nice if this information could be added automatically by the thing that posts to Slack.
This is fine, though I do use plenty of tools which have decided to show something when nothing needs doing and instead supply a silent/quiet flag to suppress these (cf black, MyPy). The main need for having a message when there are no alerts is in the case when reporting to Slack, to serve as a confirmation that the check is happening. |
e3a7753
to
16778a8
Compare
|
||
```terminal | ||
$ dependabot-alerts <your_github_user_or_organization> | ||
$ dependabot-alerts <your_github_organization> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only works for organizations, not users. (Not sure if the code on main
actually works for user either?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed the code on main
does not actually work for user accounts. It does with a very slight change to the GraphQL query though - swap organization(login: ...)
for user(login: ...)
at the top. Support for querying user accounts might be nice to have for a general-purpose tool, but isn't need for our use case.
src/dependabot_alerts/cli.py
Outdated
print( | ||
f"There are no Dependabot security alerts in the {organization} GitHub organization." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be posted into Slack each time the scheduled workflow runs, if there are no alerts. Might be helpful to remind us that the workflow is running? Alternatively we could print nothing here and then we'd have to change the workflow to not try to post into Slack if the command outputs nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be posted into Slack each time the scheduled workflow runs, if there are no alerts. Might be helpful to remind us that the workflow is running?
I think it is good to output something in Slack if there are no alerts. Aside from reminding us that the workflow is running, it will also make new devs aware of the tool.
src/dependabot_alerts/core.py
Outdated
@dataclass(frozen=True) | ||
class Alert: | ||
repo_full_name: str | None | ||
repo_html_url: str | None = field(compare=False, repr=False) | ||
ghsa_id: str | None | ||
package: str | None = field(compare=False) | ||
manifest_path: str | None = field(compare=False) | ||
summary: str | None = field(compare=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The frozen=True
makes Alert
's hashable so that they can be used as dict keys (or put into sets etc).
Any two Alert
's with the same repo_full_name
(e.g. hypothesis/viahtml
) and ghsa_id
are considered equal, even if they may differ in other fields (e.g. the manifest_path
: requirements.txt
, tests.txt
. etc). This definition of equality is how we group/deduplicate alerts. What it ends up meaning in practice is that alerts with the same repo and GHSA ID but for different requirements files end up getting deduplicated/grouped.
src/dependabot_alerts/core.py
Outdated
def make(cls, alert_dict): | ||
return cls( | ||
repo_full_name=alert_dict["repository"]["full_name"], | ||
repo_html_url=alert_dict["repository"]["html_url"], | ||
ghsa_id=alert_dict["security_advisory"]["ghsa_id"], | ||
package=alert_dict["dependency"]["package"]["name"], | ||
manifest_path=alert_dict["dependency"]["manifest_path"], | ||
summary=alert_dict["security_advisory"]["summary"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will crash if the GitHub API's responses change, but we have Slack notifications for if the workflow fails and failing-fast here is simpler than coding defensively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can assume that GitHub API response changes will be rare for APIs that have been around for some time, though it is still good to be notified if it happens.
if alert in alerts: | ||
alerts[alert].duplicates.append(alert) | ||
else: | ||
alerts[alert] = alert |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the deduplication here
I've added each alert's URL, severity and summary to the output (both Slack and non-Slack), and tweaked the formatting a bit. It's still one-line-per-alert but should be fairly readable and actionable I think.
I've taken a slightly different approach to deduplication on this branch: it groups together alerts that belong to the same repo and have the same GHSA ID. So you won't get multiple line of output if the same alert is repeated for different lockfiles, but you will if there are actually two different alerts for the same package. After all they are different alerts, with different severities and summary descriptions, so I wouldn't want to omit any of them.
I think it'd be nice to add a reusable
I've added this "There are no alerts" confirmation message when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this overhaul and the tests. Let's get this landed and we can discuss what to do about the current open alerts in ViaHTML.
src/dependabot_alerts/core.py
Outdated
) | ||
except CalledProcessError as err: # pragma: no cover | ||
print(err.stdout) | ||
print(err.stderr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer this to print to stderr with context about where the stdout/stderr is coming from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this to print any stdout from GitHub CLI to stdout and any stderr from GitHub CLI to stderr, and to prefix each line with GitHub CLI stdout>
or GitHub CLI stderr>
:
dependabot-alerts/src/dependabot_alerts/core.py
Lines 50 to 55 in bb0d807
except CalledProcessError as err: # pragma: no cover | |
for line in err.stdout.splitlines(): | |
print(f"GitHub CLI stdout> {line}") | |
for line in err.stderr.splitlines(): | |
print(f"GitHub CLI stderr> {line}", file=sys.stderr) | |
raise |
Example (trying to get alerts for a user rather than an organization fails):
$ tox -qqe dev --run-command 'dependabot-alerts seanh'
GitHub CLI stdout> {"message":"Not Found","documentation_url":"https://docs.github.com/rest/dependabot/alerts#list-dependabot-alerts-for-an-organization"}
GitHub CLI stderr> gh: Not Found (HTTP 404)
Traceback (most recent call last):
File "/home/seanh/GitHub/hypothesis/dependabot-alerts/.tox/dev/bin/dependabot-alerts", line 8, in <module>
sys.exit(cli())
^^^^^
File "/home/seanh/GitHub/hypothesis/dependabot-alerts/.tox/dev/lib/python3.12/site-packages/dependabot_alerts/cli.py", line 22, in cli
repos = github.alerts(organization)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/seanh/GitHub/hypothesis/dependabot-alerts/.tox/dev/lib/python3.12/site-packages/dependabot_alerts/core.py", line 43, in alerts
result = self._run(
^^^^^^^^^^
File "/home/seanh/.pyenv/versions/3.12.0/lib/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['gh', 'api', '--paginate', '/orgs/seanh/dependabot/alerts?state=open']' returned non-zero exit status 1.
|
||
```terminal | ||
$ dependabot-alerts <your_github_user_or_organization> | ||
$ dependabot-alerts <your_github_organization> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed the code on main
does not actually work for user accounts. It does with a very slight change to the GraphQL query though - swap organization(login: ...)
for user(login: ...)
at the top. Support for querying user accounts might be nice to have for a general-purpose tool, but isn't need for our use case.
src/dependabot_alerts/cli.py
Outdated
print( | ||
f"There are no Dependabot security alerts in the {organization} GitHub organization." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be posted into Slack each time the scheduled workflow runs, if there are no alerts. Might be helpful to remind us that the workflow is running?
I think it is good to output something in Slack if there are no alerts. Aside from reminding us that the workflow is running, it will also make new devs aware of the tool.
src/dependabot_alerts/core.py
Outdated
def make(cls, alert_dict): | ||
return cls( | ||
repo_full_name=alert_dict["repository"]["full_name"], | ||
repo_html_url=alert_dict["repository"]["html_url"], | ||
ghsa_id=alert_dict["security_advisory"]["ghsa_id"], | ||
package=alert_dict["dependency"]["package"]["name"], | ||
manifest_path=alert_dict["dependency"]["manifest_path"], | ||
summary=alert_dict["security_advisory"]["summary"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can assume that GitHub API response changes will be rare for APIs that have been around for some time, though it is still good to be notified if it happens.
src/dependabot_alerts/cli.py
Outdated
for alert in alerts: | ||
print( | ||
f'{alert.repo_full_name} {alert.ghsa_id} alerts:{len(alert.duplicates) + 1} severity:{alert.severity} {alert.package} "{alert.summary}" {alert.html_url}' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think a --pretty
or --human
output that is multi-line would be nice to have, but lets get this merged first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a refactoring that separates each formatting option into a separate format_*()
function in format.py
and changes the --slack
option to --format slack
. This should make it easy to add more formats like --format pretty
or --format json
etc without the code quality deteriorating.
dependabot-alerts/src/dependabot_alerts/format.py
Lines 1 to 47 in 468bea0
"""Functions for serializing lists of Alerts into different string formats.""" from dependabot_alerts.core import Alert def format_text(alerts: list[Alert], _organization: str) -> str: return "\n".join( [ " ".join( [ alert.repo_full_name, alert.ghsa_id, f"alerts:{len(alert.duplicates) + 1}", f"severity:{alert.severity}", alert.package, alert.html_url, alert.summary, ] ) for alert in alerts ] ) def format_slack(alerts: list[Alert], organization: str) -> str: if not alerts: return f"There are no Dependabot security alerts in the {organization} GitHub organization." return "\n".join( [ f"*There are Dependabot security alerts in the {organization} GitHub organization:*", "", *[ " ".join( [ f"<{alert.html_url}|{alert.repo_full_name} {alert.ghsa_id}>", f"alerts:{len(alert.duplicates) + 1}", f"severity:{alert.severity}", f"`{alert.package}`", alert.summary, ] ) for alert in alerts ], "", "Message generated by the `alerts.yml` workflow <https://github.com/hypothesis/dependabot-alerts/blob/main/.github/workflows/alert.yml|in dependabot-alerts>", ] ) dependabot-alerts/src/dependabot_alerts/cli.py
Lines 22 to 29 in 468bea0
formatters = { "text": format_text, "slack": format_slack, } formatter = formatters[args.format] output = formatter(alerts, args.organization)
8046836
to
468bea0
Compare
Simplify the code by using GitHub CLI and the GitHub REST API. Also add tests. Fixes #5
...also add tests.
This results in significantly less than half of the original lines of code (if you don't count the new tests) by:
Removing the Python Slack API client in favour of the
slackapi/slack-github-action
.This change has already been done on
main
.Removing the GitHub API client in favour of shelling out to GitHub CLI.
Calling GitHub's REST API instead of their GraphQL API.
The output format has also been simplified.
I think we should probably prefer the REST API over GraphQL whenever possible: it's both simpler and better documented. If this breaks because GitHub change their API, I'd much rather have to update a script's REST API usage than try to figure out what broke with a GraphQL query.
The tests may seem a bit OTT but they will be helpful if we want to make changes to this in future. If they become a hindrance we can always just delete them.
Testing manually
This will use your GitHub CLI authentication (set up via
gh auth login
) or you can setGITHUB_TOKEN
to use a different authentication token.The output format is simple: one alert per line, with no preamble or footer lines or indentation etc. This is good UNIX command design, for example it makes the output easy to parse in case you want to pipe it to another command, for example to count the number of alerts:
dependabot-alerts hypothesis | wc -l
.There's also:
Deduplication
We get the same alert multiple times in the same repo when the same package is mentioned in multiple Python requirements files. When this happens only one line will be output but it'll note the number of duplicate copies:
Dependabot security alerts have a GHSA ID field that is a required, non-nullable unique ID. There are other IDs in there as well, like the CVE ID, but these are optional or nullable. The code considers two alerts to be equal if they belong to the same repo and have the same GHSA ID, and will group them onto one line as above.
Slack formatting
Pass
--slack
to print output in the form of a nicely-formatted Slack post:When there are no alerts
If there are no alerts then
dependabot-alerts hypothesis
will just exit with0
and print nothing. Again: good UNIX command design. For examplels
doesn't printThere are no files
if you run it in an empty directory.With
--slack
if there are no alerts it'll printThere are no Dependabot security alerts in the hypothesis GitHub organization
and the GitHub Actions workflow will actually post that message into Slack. This might be useful, as a reminder that the scheduled action is running? Alternatively we could make it print nothing in this case and then make the workflow not try to post to Slack if there is no output.Error handling
If something goes wrong you'll get an error message from GitHub CLI and it'll exit with
1
. It's not super friendly but it's fine. For example if you're not logged in: