Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record repo ownership in the database #141

Merged
merged 17 commits into from
Mar 12, 2024
Merged

Record repo ownership in the database #141

merged 17 commits into from
Mar 12, 2024

Conversation

benbc
Copy link
Contributor

@benbc benbc commented Mar 6, 2024

This introduces a task metrics repos, which doesn't strictly deal in metrics (it writes plain old tabular data that can be used for showing useful data on our delivery dashboard). I think that's probably okay, but worth double-checking that we are happy with this use.

(The thing I'm planning to do with this data is add a table of unowned repos. Currently I have a shell script that I use to check this monthly so that I can chase people up when they create new repos and forget to assign them to a team in GitHub.)

image

We need to add permissions to the PAT used in production before deploying this change.

metrics/github/repos.py Outdated Show resolved Hide resolved
@benbc benbc mentioned this pull request Mar 8, 2024
Copy link
Member

@iaindillingham iaindillingham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really enjoyed reviewing this PR. Whilst I'm not completely up to speed on the codebase, splitting the PR into many small commits, with clear signposting, helped enormously.

You asked in the PR's description whether we're happy to write tabular data as well as metrics to the database. Unless I've misunderstood, I think this is okay: I think we're using "metrics" very loosely to mean "operational information", rather than "operational information for performance measurement".

I've made a few comments: these are for my understanding rather than suggestions for changing the codebase.

metrics/github/repos.py Outdated Show resolved Hide resolved
@pytest.fixture
def patch_query_to_return(monkeypatch):
def patch(the_repos):
monkeypatch.setattr(repos.query, "repos", lambda *_args: the_repos)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this line interesting. Couple of questions:

  • repos.query is a reference to query. Why did you choose not to patch query? The test passes, either way.
  • What are you trying to communicate by using *_args rather than *args?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you choose not to patch query?

I was copying the way previous code in these tests worked. I decided that I quite liked it because it depends only on a local implementation detail of repos (it calls a function repos() on an object query), not on which module that function is originally defined in. I don't feel strongly about that, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you trying to communicate by using *_args rather than *args?

This is a convention that I follow and which happily the IDE I use also follows: arguments whose name starts with underscores are unused. (My IDE (PyCharm) warns me about unused parameters unless they're named like that.)

I think that the convention originates in Haskell.

check_response(response)

data = response.json()
if isinstance(data, list):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be interested to know your thoughts on checking whether data is a list versus checking whether it is an Iterable. I think the latter is an example of goose typing (see "Goose Typing" in Fluent Python).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really think this through.

Having now thought it through (thank you), I prefer checking for list because if I've misunderstood the API (or it changes?!?!?) then the only possible alternatives I can imagine finding here are dict or str (because, NB, this is deserialized JSON) -- and in either case a) yield from would happily work and b) that definitely wouldn't be what we wanted in either case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, of course! Iterable could be a string.

benbc added 17 commits March 12, 2024 16:04
I suspect that this was included in an now-abandoned attempt to keep
code coverage at a specific level.
Previously there was a fixture for creating this dummy table, but it was
ignored in favour of using an arbitrary table from production
code. There's no reason to depend on the production tables when changes
to them will cause unnecessary churn in the tests.
This single test took 0.25s.
This test previously took about a minute.
There are two reasons for doing this:

* so we can report on unowned repos to help keep our records up-to-date
* as a first step towards getting rid of the hard-coded exclusion list
  for ebmdatalab repos

The first of these we can do with these changes. The second will have to
wait until we've got accurate repo lists for ebmdatalab, but the code
here means that should be fairly straightforward eventually.
@benbc benbc merged commit 1fc9be9 into main Mar 12, 2024
13 checks passed
@benbc benbc deleted the benbc/github-issues branch March 12, 2024 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants