-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Record repo ownership in the database #141
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really enjoyed reviewing this PR. Whilst I'm not completely up to speed on the codebase, splitting the PR into many small commits, with clear signposting, helped enormously.
You asked in the PR's description whether we're happy to write tabular data as well as metrics to the database. Unless I've misunderstood, I think this is okay: I think we're using "metrics" very loosely to mean "operational information", rather than "operational information for performance measurement".
I've made a few comments: these are for my understanding rather than suggestions for changing the codebase.
tests/metrics/github/test_repos.py
Outdated
@pytest.fixture | ||
def patch_query_to_return(monkeypatch): | ||
def patch(the_repos): | ||
monkeypatch.setattr(repos.query, "repos", lambda *_args: the_repos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this line interesting. Couple of questions:
repos.query
is a reference toquery
. Why did you choose not to patchquery
? The test passes, either way.- What are you trying to communicate by using
*_args
rather than*args
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you choose not to patch
query
?
I was copying the way previous code in these tests worked. I decided that I quite liked it because it depends only on a local implementation detail of repos
(it calls a function repos()
on an object query
), not on which module that function is originally defined in. I don't feel strongly about that, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are you trying to communicate by using
*_args
rather than*args
?
This is a convention that I follow and which happily the IDE I use also follows: arguments whose name starts with underscores are unused. (My IDE (PyCharm) warns me about unused parameters unless they're named like that.)
I think that the convention originates in Haskell.
check_response(response) | ||
|
||
data = response.json() | ||
if isinstance(data, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be interested to know your thoughts on checking whether data
is a list
versus checking whether it is an Iterable
. I think the latter is an example of goose typing (see "Goose Typing" in Fluent Python).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't really think this through.
Having now thought it through (thank you), I prefer checking for list
because if I've misunderstood the API (or it changes?!?!?) then the only possible alternatives I can imagine finding here are dict
or str
(because, NB, this is deserialized JSON) -- and in either case a) yield from
would happily work and b) that definitely wouldn't be what we wanted in either case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, of course! Iterable
could be a string.
I suspect that this was included in an now-abandoned attempt to keep code coverage at a specific level.
Previously there was a fixture for creating this dummy table, but it was ignored in favour of using an arbitrary table from production code. There's no reason to depend on the production tables when changes to them will cause unnecessary churn in the tests.
This single test took 0.25s.
This test previously took about a minute.
There are two reasons for doing this: * so we can report on unowned repos to help keep our records up-to-date * as a first step towards getting rid of the hard-coded exclusion list for ebmdatalab repos The first of these we can do with these changes. The second will have to wait until we've got accurate repo lists for ebmdatalab, but the code here means that should be fairly straightforward eventually.
1dd64ea
to
08318a5
Compare
This introduces a task
metrics repos
, which doesn't strictly deal in metrics (it writes plain old tabular data that can be used for showing useful data on our delivery dashboard). I think that's probably okay, but worth double-checking that we are happy with this use.(The thing I'm planning to do with this data is add a table of unowned repos. Currently I have a shell script that I use to check this monthly so that I can chase people up when they create new repos and forget to assign them to a team in GitHub.)
We need to add permissions to the PAT used in production before deploying this change.