Simplify the code by using GitHub CLI and the GitHub REST API #6

seanh · 2023-10-25T20:41:42Z

...also add tests.

This results in significantly less than half of the original lines of code (if you don't count the new tests) by:

Removing the Python Slack API client in favour of the slackapi/slack-github-action.
This change has already been done on main.
Removing the GitHub API client in favour of shelling out to GitHub CLI.
Calling GitHub's REST API instead of their GraphQL API.
The output format has also been simplified.

I think we should probably prefer the REST API over GraphQL whenever possible: it's both simpler and better documented. If this breaks because GitHub change their API, I'd much rather have to update a script's REST API usage than try to figure out what broke with a GraphQL query.

The tests may seem a bit OTT but they will be helpful if we want to make changes to this in future. If they become a hindrance we can always just delete them.

Testing manually

$ tox -qqe dev --run-command 'dependabot-alerts hypothesis'

This will use your GitHub CLI authentication (set up via gh auth login) or you can set GITHUB_TOKEN to use a different authentication token.

The output format is simple: one alert per line, with no preamble or footer lines or indentation etc. This is good UNIX command design, for example it makes the output easy to parse in case you want to pipe it to another command, for example to count the number of alerts: dependabot-alerts hypothesis | wc -l.

There's also:

$ dependabot-alerts --help
$ dependabot-alerts --version

Deduplication

We get the same alert multiple times in the same repo when the same package is mentioned in multiple Python requirements files. When this happens only one line will be output but it'll note the number of duplicate copies:

hypothesis/somerepo GHSA-****-****-**** alerts:7 severity:medium <package_name> "<security_advistory_summary>" https://github.com/hypothesis/somerepo/security/dependabot/123

Dependabot security alerts have a GHSA ID field that is a required, non-nullable unique ID. There are other IDs in there as well, like the CVE ID, but these are optional or nullable. The code considers two alerts to be equal if they belong to the same repo and have the same GHSA ID, and will group them onto one line as above.

Slack formatting

Pass --slack to print output in the form of a nicely-formatted Slack post:

$ tox -qqe dev --run-command 'dependabot-alerts --slack hypothesis'

When there are no alerts

If there are no alerts then dependabot-alerts hypothesis will just exit with 0 and print nothing. Again: good UNIX command design. For example ls doesn't print There are no files if you run it in an empty directory.

With --slack if there are no alerts it'll print There are no Dependabot security alerts in the hypothesis GitHub organization and the GitHub Actions workflow will actually post that message into Slack. This might be useful, as a reminder that the scheduled action is running? Alternatively we could make it print nothing in this case and then make the workflow not try to post to Slack if there is no output.

Error handling

If something goes wrong you'll get an error message from GitHub CLI and it'll exit with 1. It's not super friendly but it's fine. For example if you're not logged in:

$ tox -qqe dev --run-command 'dependabot-alerts hypothesis'

To get started with GitHub CLI, please run:  gh auth login
Alternatively, populate the GH_TOKEN environment variable with a GitHub API authentication token.

Traceback (most recent call last):
...

robertknight · 2023-10-27T12:00:17Z

The output format is simple: one repo/package per line, with no preamble or footer lines or indentation etc. This is good UNIX command design, for example it makes the output easy to parse in case you want to pipe it to another command, for example to count the number of vulnerable packages: dependabot-alerts hypothesis | wc -l.

The output in the earlier version was designed to be useful for a human to read and act on. I don't mind having a succinct pipeline-friendly mode as an option, like many CLI utilities do (or check isatty), but it is annoying not to have this information in the output:

Convenient link to alert
Severity
Short summary of what the issue is

The code on main seems to have its own concept of "vulnerabilities" that doesn't seem to map directly to what the Dependabot alerts REST API returns, and it seems to only report two vulnerabilities in Via HTML. I'm not sure I understand what the code on main is doing there.

Each Vulnerability object on main corresponds to one alert from Dependabot. The name comes from the RepositoryVulnerabilityAlert schema in GraphQL), but with a limit that only one is reported per (package, repository).

The rationale is that in a given repository, you would normally only need to take one action per affected package, rather than for each alert if there are multiple vulnerabilities in the same package or multiple alerts for the same issue that affect different lockfiles.

Message generated by the alerts.yml workflow <https://github.com/hypothesis/dependabot-alerts/blob/main/.github/workflows/alert.yml|in dependabot-alerts>

This is useful, although it would be nice if this information could be added automatically by the thing that posts to Slack.

If there are no alerts then dependabot-alerts hypothesis will just exit with 0 and print nothing. Again: good UNIX command design. For example ls doesn't print There are no files if you run it in an empty directory.

This is fine, though I do use plenty of tools which have decided to show something when nothing needs doing and instead supply a silent/quiet flag to suppress these (cf black, MyPy). The main need for having a message when there are no alerts is in the case when reporting to Slack, to serve as a confirmation that the check is happening.

seanh · 2023-10-29T14:49:35Z

.cookiecutter/includes/README/head.md


 ```terminal
-$ dependabot-alerts <your_github_user_or_organization>
+$ dependabot-alerts <your_github_organization>


It only works for organizations, not users. (Not sure if the code on main actually works for user either?)

Indeed the code on main does not actually work for user accounts. It does with a very slight change to the GraphQL query though - swap organization(login: ...) for user(login: ...) at the top. Support for querying user accounts might be nice to have for a general-purpose tool, but isn't need for our use case.

seanh · 2023-10-29T14:51:56Z

src/dependabot_alerts/cli.py

+        print(
+            f"There are no Dependabot security alerts in the {organization} GitHub organization."


This will be posted into Slack each time the scheduled workflow runs, if there are no alerts. Might be helpful to remind us that the workflow is running? Alternatively we could print nothing here and then we'd have to change the workflow to not try to post into Slack if the command outputs nothing.

This will be posted into Slack each time the scheduled workflow runs, if there are no alerts. Might be helpful to remind us that the workflow is running?

I think it is good to output something in Slack if there are no alerts. Aside from reminding us that the workflow is running, it will also make new devs aware of the tool.

seanh · 2023-10-29T14:55:26Z

src/dependabot_alerts/core.py

+@dataclass(frozen=True)
+class Alert:
+    repo_full_name: str | None
+    repo_html_url: str | None = field(compare=False, repr=False)
+    ghsa_id: str | None
+    package: str | None = field(compare=False)
+    manifest_path: str | None = field(compare=False)
+    summary: str | None = field(compare=False)


The frozen=True makes Alert's hashable so that they can be used as dict keys (or put into sets etc).

Any two Alert's with the same repo_full_name (e.g. hypothesis/viahtml) and ghsa_id are considered equal, even if they may differ in other fields (e.g. the manifest_path: requirements.txt, tests.txt. etc). This definition of equality is how we group/deduplicate alerts. What it ends up meaning in practice is that alerts with the same repo and GHSA ID but for different requirements files end up getting deduplicated/grouped.

seanh · 2023-10-29T14:59:52Z

src/dependabot_alerts/core.py

+    def make(cls, alert_dict):
+        return cls(
+            repo_full_name=alert_dict["repository"]["full_name"],
+            repo_html_url=alert_dict["repository"]["html_url"],
+            ghsa_id=alert_dict["security_advisory"]["ghsa_id"],
+            package=alert_dict["dependency"]["package"]["name"],
+            manifest_path=alert_dict["dependency"]["manifest_path"],
+            summary=alert_dict["security_advisory"]["summary"],


This will crash if the GitHub API's responses change, but we have Slack notifications for if the workflow fails and failing-fast here is simpler than coding defensively.

I think we can assume that GitHub API response changes will be rare for APIs that have been around for some time, though it is still good to be notified if it happens.

seanh · 2023-10-29T15:29:29Z

src/dependabot_alerts/core.py

+            if alert in alerts:
+                alerts[alert].duplicates.append(alert)
+            else:
+                alerts[alert] = alert


This is the deduplication here

seanh · 2023-10-29T16:23:00Z

The output in the earlier version was designed to be useful for a human to read and act on. I don't mind having a succinct pipeline-friendly mode as an option, like many CLI utilities do (or check isatty), but it is annoying not to have this information in the output:
* Convenient link to alert

* Severity

* Short summary of what the issue is

I've added each alert's URL, severity and summary to the output (both Slack and non-Slack), and tweaked the formatting a bit. It's still one-line-per-alert but should be fairly readable and actionable I think.

Each Vulnerability object on main corresponds to one alert from Dependabot. The name comes from the RepositoryVulnerabilityAlert schema in GraphQL), but with a limit that only one is reported per (package, repository).

The rationale is that in a given repository, you would normally only need to take one action per affected package, rather than for each alert if there are multiple vulnerabilities in the same package or multiple alerts for the same issue that affect different lockfiles.

I've taken a slightly different approach to deduplication on this branch: it groups together alerts that belong to the same repo and have the same GHSA ID. So you won't get multiple line of output if the same alert is repeated for different lockfiles, but you will if there are actually two different alerts for the same package. After all they are different alerts, with different severities and summary descriptions, so I wouldn't want to omit any of them.

Message generated by the alerts.yml workflow <https://github.com/hypothesis/dependabot-alerts/blob/main/.github/workflows/alert.yml|in dependabot-alerts>

This is useful, although it would be nice if this information could be added automatically by the thing that posts to Slack.

I think it'd be nice to add a reusable slack.yml workflow to https://github.com/hypothesis/workflows that's a simple wrapper for the Slack action. That would have other benefits as well (easier to have Dependabot keep the version number of the Slack action up to date) and the reusable workflow could add this "generated by..." (referencing the caller workflow), then all our Slack alerts would get it. For another day perhaps.

If there are no alerts then dependabot-alerts hypothesis will just exit with 0 and print nothing. Again: good UNIX command design. For example ls doesn't print There are no files if you run it in an empty directory.

This is fine, though I do use plenty of tools which have decided to show something when nothing needs doing and instead supply a silent/quiet flag to suppress these (cf black, MyPy). The main need for having a message when there are no alerts is in the case when reporting to Slack, to serve as a confirmation that the check is happening.

I've added this "There are no alerts" confirmation message when --slack is used.

robertknight

Thanks for this overhaul and the tests. Let's get this landed and we can discuss what to do about the current open alerts in ViaHTML.

robertknight · 2023-10-27T10:52:38Z

src/dependabot_alerts/core.py

+            )
+        except CalledProcessError as err:  # pragma: no cover
+            print(err.stdout)
+            print(err.stderr)


I'd prefer this to print to stderr with context about where the stdout/stderr is coming from.

I've changed this to print any stdout from GitHub CLI to stdout and any stderr from GitHub CLI to stderr, and to prefix each line with GitHub CLI stdout> or GitHub CLI stderr> :

dependabot-alerts/src/dependabot_alerts/core.py

Lines 50 to 55 in bb0d807

except CalledProcessError as err: # pragma: no cover

for line in err.stdout.splitlines():

print(f"GitHub CLI stdout> {line}")

for line in err.stderr.splitlines():

print(f"GitHub CLI stderr> {line}", file=sys.stderr)

raise

Example (trying to get alerts for a user rather than an organization fails):

$ tox -qqe dev --run-command 'dependabot-alerts seanh' GitHub CLI stdout> {"message":"Not Found","documentation_url":"https://docs.github.com/rest/dependabot/alerts#list-dependabot-alerts-for-an-organization"} GitHub CLI stderr> gh: Not Found (HTTP 404) Traceback (most recent call last): File "/home/seanh/GitHub/hypothesis/dependabot-alerts/.tox/dev/bin/dependabot-alerts", line 8, in <module> sys.exit(cli()) ^^^^^ File "/home/seanh/GitHub/hypothesis/dependabot-alerts/.tox/dev/lib/python3.12/site-packages/dependabot_alerts/cli.py", line 22, in cli repos = github.alerts(organization) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/seanh/GitHub/hypothesis/dependabot-alerts/.tox/dev/lib/python3.12/site-packages/dependabot_alerts/core.py", line 43, in alerts result = self._run( ^^^^^^^^^^ File "/home/seanh/.pyenv/versions/3.12.0/lib/python3.12/subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['gh', 'api', '--paginate', '/orgs/seanh/dependabot/alerts?state=open']' returned non-zero exit status 1.

robertknight · 2023-10-30T09:30:35Z

.cookiecutter/includes/README/head.md


 ```terminal
-$ dependabot-alerts <your_github_user_or_organization>
+$ dependabot-alerts <your_github_organization>


Indeed the code on main does not actually work for user accounts. It does with a very slight change to the GraphQL query though - swap organization(login: ...) for user(login: ...) at the top. Support for querying user accounts might be nice to have for a general-purpose tool, but isn't need for our use case.

robertknight · 2023-10-30T09:31:19Z

src/dependabot_alerts/cli.py

+        print(
+            f"There are no Dependabot security alerts in the {organization} GitHub organization."


This will be posted into Slack each time the scheduled workflow runs, if there are no alerts. Might be helpful to remind us that the workflow is running?

I think it is good to output something in Slack if there are no alerts. Aside from reminding us that the workflow is running, it will also make new devs aware of the tool.

robertknight · 2023-10-30T09:32:49Z

src/dependabot_alerts/core.py

+    def make(cls, alert_dict):
+        return cls(
+            repo_full_name=alert_dict["repository"]["full_name"],
+            repo_html_url=alert_dict["repository"]["html_url"],
+            ghsa_id=alert_dict["security_advisory"]["ghsa_id"],
+            package=alert_dict["dependency"]["package"]["name"],
+            manifest_path=alert_dict["dependency"]["manifest_path"],
+            summary=alert_dict["security_advisory"]["summary"],


I think we can assume that GitHub API response changes will be rare for APIs that have been around for some time, though it is still good to be notified if it happens.

robertknight · 2023-10-30T09:36:37Z

src/dependabot_alerts/cli.py

+        for alert in alerts:
+            print(
+                f'{alert.repo_full_name} {alert.ghsa_id} alerts:{len(alert.duplicates) + 1} severity:{alert.severity} {alert.package} "{alert.summary}" {alert.html_url}'
+            )


I still think a --pretty or --human output that is multi-line would be nice to have, but lets get this merged first.

I've pushed a refactoring that separates each formatting option into a separate format_*() function in format.py and changes the --slack option to --format slack. This should make it easy to add more formats like --format pretty or --format json etc without the code quality deteriorating.

dependabot-alerts/src/dependabot_alerts/format.py

Lines 1 to 47 in 468bea0

"""Functions for serializing lists of Alerts into different string formats."""

from dependabot_alerts.core import Alert

def format_text(alerts: list[Alert], _organization: str) -> str:

return "\n".join(

[

" ".join(

[

alert.repo_full_name,

alert.ghsa_id,

f"alerts:{len(alert.duplicates) + 1}",

f"severity:{alert.severity}",

alert.package,

alert.html_url,

alert.summary,

]

)

for alert in alerts

]

)

def format_slack(alerts: list[Alert], organization: str) -> str:

if not alerts:

return f"There are no Dependabot security alerts in the {organization} GitHub organization."

return "\n".join(

[

f"*There are Dependabot security alerts in the {organization} GitHub organization:*",

"",

*[

" ".join(

[

f"<{alert.html_url}|{alert.repo_full_name} {alert.ghsa_id}>",

f"alerts:{len(alert.duplicates) + 1}",

f"severity:{alert.severity}",

f"`{alert.package}`",

alert.summary,

]

)

for alert in alerts

],

"",

"Message generated by the `alerts.yml` workflow <https://github.com/hypothesis/dependabot-alerts/blob/main/.github/workflows/alert.yml|in dependabot-alerts>",

]

)

dependabot-alerts/src/dependabot_alerts/cli.py

Lines 22 to 29 in 468bea0

formatters = {

"text": format_text,

"slack": format_slack,

}

formatter = formatters[args.format]

output = formatter(alerts, args.organization)

Simplify the code by using GitHub CLI and the GitHub REST API. Also add tests. Fixes #5

seanh force-pushed the simplify branch from 8ca36be to 711a606 Compare October 25, 2023 22:31

seanh marked this pull request as ready for review October 25, 2023 22:35

seanh marked this pull request as draft October 25, 2023 22:35

seanh force-pushed the simplify branch from 711a606 to a290bb6 Compare October 25, 2023 22:45

seanh marked this pull request as ready for review October 25, 2023 22:50

seanh force-pushed the simplify branch from eecc0ae to acb0098 Compare October 26, 2023 13:58

seanh marked this pull request as draft October 26, 2023 14:33

seanh requested a review from robertknight October 26, 2023 14:34

seanh force-pushed the simplify branch 4 times, most recently from e3a7753 to 16778a8 Compare October 29, 2023 15:30

seanh commented Oct 29, 2023

View reviewed changes

seanh marked this pull request as ready for review October 29, 2023 15:47

robertknight approved these changes Oct 30, 2023

View reviewed changes

seanh force-pushed the simplify branch 3 times, most recently from 8046836 to 468bea0 Compare November 7, 2023 13:49

Simplify the code by using GitHub CLI

bb0d807

Simplify the code by using GitHub CLI and the GitHub REST API. Also add tests. Fixes #5

seanh force-pushed the simplify branch from 468bea0 to bb0d807 Compare November 7, 2023 13:51

seanh merged commit 601acda into main Nov 7, 2023
8 checks passed

seanh deleted the simplify branch November 7, 2023 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify the code by using GitHub CLI and the GitHub REST API #6

Simplify the code by using GitHub CLI and the GitHub REST API #6

seanh commented Oct 25, 2023 •

edited

Loading

robertknight commented Oct 27, 2023

seanh Oct 29, 2023 •

edited

Loading

robertknight Oct 30, 2023

seanh Oct 29, 2023

robertknight Oct 30, 2023

seanh Oct 29, 2023

seanh Oct 29, 2023

robertknight Oct 30, 2023

seanh Oct 29, 2023

seanh commented Oct 29, 2023 •

edited

Loading

robertknight left a comment

robertknight Oct 27, 2023

seanh Nov 7, 2023 •

edited

Loading

robertknight Oct 30, 2023

robertknight Oct 30, 2023

robertknight Oct 30, 2023

robertknight Oct 30, 2023

seanh Nov 7, 2023 •

edited

Loading

		print(
		f"There are no Dependabot security alerts in the {organization} GitHub organization."

	except CalledProcessError as err: # pragma: no cover
	for line in err.stdout.splitlines():
	print(f"GitHub CLI stdout> {line}")
	for line in err.stderr.splitlines():
	print(f"GitHub CLI stderr> {line}", file=sys.stderr)
	raise

	"""Functions for serializing lists of Alerts into different string formats."""
	from dependabot_alerts.core import Alert


	def format_text(alerts: list[Alert], _organization: str) -> str:
	return "\n".join(
	[
	" ".join(
	[
	alert.repo_full_name,
	alert.ghsa_id,
	f"alerts:{len(alert.duplicates) + 1}",
	f"severity:{alert.severity}",
	alert.package,
	alert.html_url,
	alert.summary,
	]
	)
	for alert in alerts
	]
	)


	def format_slack(alerts: list[Alert], organization: str) -> str:
	if not alerts:
	return f"There are no Dependabot security alerts in the {organization} GitHub organization."

	return "\n".join(
	[
	f"There are Dependabot security alerts in the {organization} GitHub organization:",
	"",
	*[
	" ".join(
	[
	f"<{alert.html_url}\|{alert.repo_full_name} {alert.ghsa_id}>",
	f"alerts:{len(alert.duplicates) + 1}",
	f"severity:{alert.severity}",
	f"`{alert.package}`",
	alert.summary,
	]
	)
	for alert in alerts
	],
	"",
	"Message generated by the `alerts.yml` workflow <https://github.com/hypothesis/dependabot-alerts/blob/main/.github/workflows/alert.yml\|in dependabot-alerts>",
	]
	)

	formatters = {
	"text": format_text,
	"slack": format_slack,
	}

	formatter = formatters[args.format]

	output = formatter(alerts, args.organization)

Simplify the code by using GitHub CLI and the GitHub REST API #6

Simplify the code by using GitHub CLI and the GitHub REST API #6

Conversation

seanh commented Oct 25, 2023 • edited Loading

Testing manually

Deduplication

Slack formatting

When there are no alerts

Error handling

robertknight commented Oct 27, 2023

seanh Oct 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanh commented Oct 29, 2023 • edited Loading

robertknight left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanh Nov 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanh Nov 7, 2023 • edited Loading

Choose a reason for hiding this comment

seanh commented Oct 25, 2023 •

edited

Loading

seanh Oct 29, 2023 •

edited

Loading

seanh commented Oct 29, 2023 •

edited

Loading

seanh Nov 7, 2023 •

edited

Loading

seanh Nov 7, 2023 •

edited

Loading