Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the code by using GitHub CLI and the GitHub REST API #6

Merged
merged 1 commit into from
Nov 7, 2023

Conversation

seanh
Copy link
Contributor

@seanh seanh commented Oct 25, 2023

...also add tests.

This results in significantly less than half of the original lines of code (if you don't count the new tests) by:

  • Removing the Python Slack API client in favour of the slackapi/slack-github-action.
    This change has already been done on main.

  • Removing the GitHub API client in favour of shelling out to GitHub CLI.

  • Calling GitHub's REST API instead of their GraphQL API.

  • The output format has also been simplified.

I think we should probably prefer the REST API over GraphQL whenever possible: it's both simpler and better documented. If this breaks because GitHub change their API, I'd much rather have to update a script's REST API usage than try to figure out what broke with a GraphQL query.

The tests may seem a bit OTT but they will be helpful if we want to make changes to this in future. If they become a hindrance we can always just delete them.

Testing manually

$ tox -qqe dev --run-command 'dependabot-alerts hypothesis'

This will use your GitHub CLI authentication (set up via gh auth login) or you can set GITHUB_TOKEN to use a different authentication token.

The output format is simple: one alert per line, with no preamble or footer lines or indentation etc. This is good UNIX command design, for example it makes the output easy to parse in case you want to pipe it to another command, for example to count the number of alerts: dependabot-alerts hypothesis | wc -l.

There's also:

$ dependabot-alerts --help
$ dependabot-alerts --version

Deduplication

We get the same alert multiple times in the same repo when the same package is mentioned in multiple Python requirements files. When this happens only one line will be output but it'll note the number of duplicate copies:

hypothesis/somerepo GHSA-****-****-**** alerts:7 severity:medium <package_name> "<security_advistory_summary>" https://github.com/hypothesis/somerepo/security/dependabot/123

Dependabot security alerts have a GHSA ID field that is a required, non-nullable unique ID. There are other IDs in there as well, like the CVE ID, but these are optional or nullable. The code considers two alerts to be equal if they belong to the same repo and have the same GHSA ID, and will group them onto one line as above.

Slack formatting

Pass --slack to print output in the form of a nicely-formatted Slack post:

$ tox -qqe dev --run-command 'dependabot-alerts --slack hypothesis'

When there are no alerts

If there are no alerts then dependabot-alerts hypothesis will just exit with 0 and print nothing. Again: good UNIX command design. For example ls doesn't print There are no files if you run it in an empty directory.

With --slack if there are no alerts it'll print There are no Dependabot security alerts in the hypothesis GitHub organization and the GitHub Actions workflow will actually post that message into Slack. This might be useful, as a reminder that the scheduled action is running? Alternatively we could make it print nothing in this case and then make the workflow not try to post to Slack if there is no output.

Error handling

If something goes wrong you'll get an error message from GitHub CLI and it'll exit with 1. It's not super friendly but it's fine. For example if you're not logged in:

$ tox -qqe dev --run-command 'dependabot-alerts hypothesis'

To get started with GitHub CLI, please run:  gh auth login
Alternatively, populate the GH_TOKEN environment variable with a GitHub API authentication token.

Traceback (most recent call last):
...

@seanh seanh marked this pull request as ready for review October 25, 2023 22:35
@seanh seanh marked this pull request as draft October 25, 2023 22:35
@seanh seanh marked this pull request as ready for review October 25, 2023 22:50
@seanh seanh marked this pull request as draft October 26, 2023 14:33
@robertknight
Copy link
Member

The output format is simple: one repo/package per line, with no preamble or footer lines or indentation etc. This is good UNIX command design, for example it makes the output easy to parse in case you want to pipe it to another command, for example to count the number of vulnerable packages: dependabot-alerts hypothesis | wc -l.

The output in the earlier version was designed to be useful for a human to read and act on. I don't mind having a succinct pipeline-friendly mode as an option, like many CLI utilities do (or check isatty), but it is annoying not to have this information in the output:

  • Convenient link to alert
  • Severity
  • Short summary of what the issue is

The code on main seems to have its own concept of "vulnerabilities" that doesn't seem to map directly to what the Dependabot alerts REST API returns, and it seems to only report two vulnerabilities in Via HTML. I'm not sure I understand what the code on main is doing there.

Each Vulnerability object on main corresponds to one alert from Dependabot. The name comes from the RepositoryVulnerabilityAlert schema in GraphQL), but with a limit that only one is reported per (package, repository).

The rationale is that in a given repository, you would normally only need to take one action per affected package, rather than for each alert if there are multiple vulnerabilities in the same package or multiple alerts for the same issue that affect different lockfiles.

Message generated by the alerts.yml workflow <https://github.com/hypothesis/dependabot-alerts/blob/main/.github/workflows/alert.yml|in dependabot-alerts>

This is useful, although it would be nice if this information could be added automatically by the thing that posts to Slack.

If there are no alerts then dependabot-alerts hypothesis will just exit with 0 and print nothing. Again: good UNIX command design. For example ls doesn't print There are no files if you run it in an empty directory.

This is fine, though I do use plenty of tools which have decided to show something when nothing needs doing and instead supply a silent/quiet flag to suppress these (cf black, MyPy). The main need for having a message when there are no alerts is in the case when reporting to Slack, to serve as a confirmation that the check is happening.

@seanh seanh force-pushed the simplify branch 4 times, most recently from e3a7753 to 16778a8 Compare October 29, 2023 15:30

```terminal
$ dependabot-alerts <your_github_user_or_organization>
$ dependabot-alerts <your_github_organization>
Copy link
Contributor Author

@seanh seanh Oct 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only works for organizations, not users. (Not sure if the code on main actually works for user either?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the code on main does not actually work for user accounts. It does with a very slight change to the GraphQL query though - swap organization(login: ...) for user(login: ...) at the top. Support for querying user accounts might be nice to have for a general-purpose tool, but isn't need for our use case.

Comment on lines 25 to 26
print(
f"There are no Dependabot security alerts in the {organization} GitHub organization."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be posted into Slack each time the scheduled workflow runs, if there are no alerts. Might be helpful to remind us that the workflow is running? Alternatively we could print nothing here and then we'd have to change the workflow to not try to post into Slack if the command outputs nothing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be posted into Slack each time the scheduled workflow runs, if there are no alerts. Might be helpful to remind us that the workflow is running?

I think it is good to output something in Slack if there are no alerts. Aside from reminding us that the workflow is running, it will also make new devs aware of the tool.

Comment on lines 21 to 16
@dataclass(frozen=True)
class Alert:
repo_full_name: str | None
repo_html_url: str | None = field(compare=False, repr=False)
ghsa_id: str | None
package: str | None = field(compare=False)
manifest_path: str | None = field(compare=False)
summary: str | None = field(compare=False)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The frozen=True makes Alert's hashable so that they can be used as dict keys (or put into sets etc).

Any two Alert's with the same repo_full_name (e.g. hypothesis/viahtml) and ghsa_id are considered equal, even if they may differ in other fields (e.g. the manifest_path: requirements.txt, tests.txt. etc). This definition of equality is how we group/deduplicate alerts. What it ends up meaning in practice is that alerts with the same repo and GHSA ID but for different requirements files end up getting deduplicated/grouped.

Comment on lines 17 to 28
def make(cls, alert_dict):
return cls(
repo_full_name=alert_dict["repository"]["full_name"],
repo_html_url=alert_dict["repository"]["html_url"],
ghsa_id=alert_dict["security_advisory"]["ghsa_id"],
package=alert_dict["dependency"]["package"]["name"],
manifest_path=alert_dict["dependency"]["manifest_path"],
summary=alert_dict["security_advisory"]["summary"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will crash if the GitHub API's responses change, but we have Slack notifications for if the workflow fails and failing-fast here is simpler than coding defensively.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can assume that GitHub API response changes will be rare for APIs that have been around for some time, though it is still good to be notified if it happens.

Comment on lines +58 to +66
if alert in alerts:
alerts[alert].duplicates.append(alert)
else:
alerts[alert] = alert
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the deduplication here

@seanh seanh marked this pull request as ready for review October 29, 2023 15:47
@seanh
Copy link
Contributor Author

seanh commented Oct 29, 2023

The output in the earlier version was designed to be useful for a human to read and act on. I don't mind having a succinct pipeline-friendly mode as an option, like many CLI utilities do (or check isatty), but it is annoying not to have this information in the output:

* Convenient link to alert

* Severity

* Short summary of what the issue is

I've added each alert's URL, severity and summary to the output (both Slack and non-Slack), and tweaked the formatting a bit. It's still one-line-per-alert but should be fairly readable and actionable I think.

Each Vulnerability object on main corresponds to one alert from Dependabot. The name comes from the RepositoryVulnerabilityAlert schema in GraphQL), but with a limit that only one is reported per (package, repository).

The rationale is that in a given repository, you would normally only need to take one action per affected package, rather than for each alert if there are multiple vulnerabilities in the same package or multiple alerts for the same issue that affect different lockfiles.

I've taken a slightly different approach to deduplication on this branch: it groups together alerts that belong to the same repo and have the same GHSA ID. So you won't get multiple line of output if the same alert is repeated for different lockfiles, but you will if there are actually two different alerts for the same package. After all they are different alerts, with different severities and summary descriptions, so I wouldn't want to omit any of them.

Message generated by the alerts.yml workflow <https://github.com/hypothesis/dependabot-alerts/blob/main/.github/workflows/alert.yml|in dependabot-alerts>

This is useful, although it would be nice if this information could be added automatically by the thing that posts to Slack.

I think it'd be nice to add a reusable slack.yml workflow to https://github.com/hypothesis/workflows that's a simple wrapper for the Slack action. That would have other benefits as well (easier to have Dependabot keep the version number of the Slack action up to date) and the reusable workflow could add this "generated by..." (referencing the caller workflow), then all our Slack alerts would get it. For another day perhaps.

If there are no alerts then dependabot-alerts hypothesis will just exit with 0 and print nothing. Again: good UNIX command design. For example ls doesn't print There are no files if you run it in an empty directory.

This is fine, though I do use plenty of tools which have decided to show something when nothing needs doing and instead supply a silent/quiet flag to suppress these (cf black, MyPy). The main need for having a message when there are no alerts is in the case when reporting to Slack, to serve as a confirmation that the check is happening.

I've added this "There are no alerts" confirmation message when --slack is used.

Copy link
Member

@robertknight robertknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this overhaul and the tests. Let's get this landed and we can discuss what to do about the current open alerts in ViaHTML.

)
except CalledProcessError as err: # pragma: no cover
print(err.stdout)
print(err.stderr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer this to print to stderr with context about where the stdout/stderr is coming from.

Copy link
Contributor Author

@seanh seanh Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this to print any stdout from GitHub CLI to stdout and any stderr from GitHub CLI to stderr, and to prefix each line with GitHub CLI stdout> or GitHub CLI stderr> :

except CalledProcessError as err: # pragma: no cover
for line in err.stdout.splitlines():
print(f"GitHub CLI stdout> {line}")
for line in err.stderr.splitlines():
print(f"GitHub CLI stderr> {line}", file=sys.stderr)
raise

Example (trying to get alerts for a user rather than an organization fails):

$  tox -qqe dev --run-command 'dependabot-alerts seanh'
GitHub CLI stdout> {"message":"Not Found","documentation_url":"https://docs.github.com/rest/dependabot/alerts#list-dependabot-alerts-for-an-organization"}
GitHub CLI stderr> gh: Not Found (HTTP 404)
Traceback (most recent call last):
  File "/home/seanh/GitHub/hypothesis/dependabot-alerts/.tox/dev/bin/dependabot-alerts", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/home/seanh/GitHub/hypothesis/dependabot-alerts/.tox/dev/lib/python3.12/site-packages/dependabot_alerts/cli.py", line 22, in cli
    repos = github.alerts(organization)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/seanh/GitHub/hypothesis/dependabot-alerts/.tox/dev/lib/python3.12/site-packages/dependabot_alerts/core.py", line 43, in alerts
    result = self._run(
             ^^^^^^^^^^
  File "/home/seanh/.pyenv/versions/3.12.0/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['gh', 'api', '--paginate', '/orgs/seanh/dependabot/alerts?state=open']' returned non-zero exit status 1.


```terminal
$ dependabot-alerts <your_github_user_or_organization>
$ dependabot-alerts <your_github_organization>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the code on main does not actually work for user accounts. It does with a very slight change to the GraphQL query though - swap organization(login: ...) for user(login: ...) at the top. Support for querying user accounts might be nice to have for a general-purpose tool, but isn't need for our use case.

Comment on lines 25 to 26
print(
f"There are no Dependabot security alerts in the {organization} GitHub organization."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be posted into Slack each time the scheduled workflow runs, if there are no alerts. Might be helpful to remind us that the workflow is running?

I think it is good to output something in Slack if there are no alerts. Aside from reminding us that the workflow is running, it will also make new devs aware of the tool.

Comment on lines 17 to 28
def make(cls, alert_dict):
return cls(
repo_full_name=alert_dict["repository"]["full_name"],
repo_html_url=alert_dict["repository"]["html_url"],
ghsa_id=alert_dict["security_advisory"]["ghsa_id"],
package=alert_dict["dependency"]["package"]["name"],
manifest_path=alert_dict["dependency"]["manifest_path"],
summary=alert_dict["security_advisory"]["summary"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can assume that GitHub API response changes will be rare for APIs that have been around for some time, though it is still good to be notified if it happens.

for alert in alerts:
print(
f'{alert.repo_full_name} {alert.ghsa_id} alerts:{len(alert.duplicates) + 1} severity:{alert.severity} {alert.package} "{alert.summary}" {alert.html_url}'
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think a --pretty or --human output that is multi-line would be nice to have, but lets get this merged first.

Copy link
Contributor Author

@seanh seanh Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a refactoring that separates each formatting option into a separate format_*() function in format.py and changes the --slack option to --format slack. This should make it easy to add more formats like --format pretty or --format json etc without the code quality deteriorating.

  • """Functions for serializing lists of Alerts into different string formats."""
    from dependabot_alerts.core import Alert
    def format_text(alerts: list[Alert], _organization: str) -> str:
    return "\n".join(
    [
    " ".join(
    [
    alert.repo_full_name,
    alert.ghsa_id,
    f"alerts:{len(alert.duplicates) + 1}",
    f"severity:{alert.severity}",
    alert.package,
    alert.html_url,
    alert.summary,
    ]
    )
    for alert in alerts
    ]
    )
    def format_slack(alerts: list[Alert], organization: str) -> str:
    if not alerts:
    return f"There are no Dependabot security alerts in the {organization} GitHub organization."
    return "\n".join(
    [
    f"*There are Dependabot security alerts in the {organization} GitHub organization:*",
    "",
    *[
    " ".join(
    [
    f"<{alert.html_url}|{alert.repo_full_name} {alert.ghsa_id}>",
    f"alerts:{len(alert.duplicates) + 1}",
    f"severity:{alert.severity}",
    f"`{alert.package}`",
    alert.summary,
    ]
    )
    for alert in alerts
    ],
    "",
    "Message generated by the `alerts.yml` workflow <https://github.com/hypothesis/dependabot-alerts/blob/main/.github/workflows/alert.yml|in dependabot-alerts>",
    ]
    )
  • formatters = {
    "text": format_text,
    "slack": format_slack,
    }
    formatter = formatters[args.format]
    output = formatter(alerts, args.organization)

@seanh seanh force-pushed the simplify branch 3 times, most recently from 8046836 to 468bea0 Compare November 7, 2023 13:49
Simplify the code by using GitHub CLI and the GitHub REST API. Also add
tests.

Fixes #5
@seanh seanh merged commit 601acda into main Nov 7, 2023
8 checks passed
@seanh seanh deleted the simplify branch November 7, 2023 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants