Record repo ownership in the database #141

benbc · 2024-03-06T16:45:35Z

This introduces a task metrics repos, which doesn't strictly deal in metrics (it writes plain old tabular data that can be used for showing useful data on our delivery dashboard). I think that's probably okay, but worth double-checking that we are happy with this use.

(The thing I'm planning to do with this data is add a table of unowned repos. Currently I have a shell script that I use to check this monthly so that I can chase people up when they create new repos and forget to assign them to a team in GitHub.)

We need to add permissions to the PAT used in production before deploying this change.

metrics/github/repos.py

iaindillingham

I really enjoyed reviewing this PR. Whilst I'm not completely up to speed on the codebase, splitting the PR into many small commits, with clear signposting, helped enormously.

You asked in the PR's description whether we're happy to write tabular data as well as metrics to the database. Unless I've misunderstood, I think this is okay: I think we're using "metrics" very loosely to mean "operational information", rather than "operational information for performance measurement".

I've made a few comments: these are for my understanding rather than suggestions for changing the codebase.

metrics/github/repos.py

iaindillingham · 2024-03-11T10:24:44Z

tests/metrics/github/test_repos.py

+@pytest.fixture
+def patch_query_to_return(monkeypatch):
+    def patch(the_repos):
+        monkeypatch.setattr(repos.query, "repos", lambda *_args: the_repos)


I found this line interesting. Couple of questions:

repos.query is a reference to query. Why did you choose not to patch query? The test passes, either way.

What are you trying to communicate by using *_args rather than *args?

Why did you choose not to patch query?

I was copying the way previous code in these tests worked. I decided that I quite liked it because it depends only on a local implementation detail of repos (it calls a function repos() on an object query), not on which module that function is originally defined in. I don't feel strongly about that, though.

What are you trying to communicate by using *_args rather than *args?

This is a convention that I follow and which happily the IDE I use also follows: arguments whose name starts with underscores are unused. (My IDE (PyCharm) warns me about unused parameters unless they're named like that.)

I think that the convention originates in Haskell.

iaindillingham · 2024-03-11T10:58:46Z

metrics/github/client.py

+            check_response(response)
+
+            data = response.json()
+            if isinstance(data, list):


I'd be interested to know your thoughts on checking whether data is a list versus checking whether it is an Iterable. I think the latter is an example of goose typing (see "Goose Typing" in Fluent Python).

I didn't really think this through.

Having now thought it through (thank you), I prefer checking for list because if I've misunderstood the API (or it changes?!?!?) then the only possible alternatives I can imagine finding here are dict or str (because, NB, this is deserialized JSON) -- and in either case a) yield from would happily work and b) that definitely wouldn't be what we wanted in either case.

Ah, yes, of course! Iterable could be a string.

metrics/timescaledb/db.py

…y run

I suspect that this was included in an now-abandoned attempt to keep code coverage at a specific level.

Previously there was a fixture for creating this dummy table, but it was ignored in favour of using an arbitrary table from production code. There's no reason to depend on the production tables when changes to them will cause unnecessary churn in the tests.

This single test took 0.25s.

This test previously took about a minute.

There are two reasons for doing this: * so we can report on unowned repos to help keep our records up-to-date * as a first step towards getting rid of the hard-coded exclusion list for ebmdatalab repos The first of these we can do with these changes. The second will have to wait until we've got accurate repo lists for ebmdatalab, but the code here means that should be fairly straightforward eventually.

Jongmassey reviewed Mar 7, 2024

View reviewed changes

metrics/github/repos.py Outdated Show resolved Hide resolved

benbc mentioned this pull request Mar 8, 2024

Use teams for ownership #142

Merged

iaindillingham approved these changes Mar 11, 2024

View reviewed changes

iaindillingham reviewed Mar 11, 2024

View reviewed changes

metrics/timescaledb/db.py Outdated Show resolved Hide resolved

benbc added 17 commits March 12, 2024 16:04

Move logic specific to our use of GitHub out of the client

f07a77c

Restructure GitHub client to make space for REST API access

8503ad0

Check for more GraphQL error cases

5ab9dc3

Add REST API access to GitHub client

4afc809

Remove unused function argument

71253d1

Remove an upsert that is unnecessary since we now drop tables on ever…

1e82369

…y run

Make batching of writes dynamic

36f92ac

Remove a test that doesn't add anything

6be5935

I suspect that this was included in an now-abandoned attempt to keep code coverage at a specific level.

Remove a slow test that adds no value

32877a3

This single test took 0.25s.

Make reset batch size configurable to speed up a slow test

f930316

This test previously took about a minute.

Fix a comment that has gone out of date

0d74a37

Improve a test helper name

0fdbfcc

Remove a couple of redundant comments

c401db7

Consistently pass complete table objects around in db

006dd21

Add capability of creating non-hypertables

16a7ffb

benbc force-pushed the benbc/github-issues branch from 1dd64ea to 08318a5 Compare March 12, 2024 16:31

benbc merged commit 1fc9be9 into main Mar 12, 2024
13 checks passed

benbc deleted the benbc/github-issues branch March 12, 2024 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record repo ownership in the database #141

Record repo ownership in the database #141

benbc commented Mar 6, 2024 •

edited

Loading

iaindillingham left a comment

iaindillingham Mar 11, 2024

benbc Mar 12, 2024

benbc Mar 12, 2024

iaindillingham Mar 11, 2024

benbc Mar 12, 2024

iaindillingham Mar 13, 2024

Record repo ownership in the database #141

Record repo ownership in the database #141

Conversation

benbc commented Mar 6, 2024 • edited Loading

iaindillingham left a comment

Choose a reason for hiding this comment

iaindillingham Mar 11, 2024

Choose a reason for hiding this comment

benbc Mar 12, 2024

Choose a reason for hiding this comment

benbc Mar 12, 2024

Choose a reason for hiding this comment

iaindillingham Mar 11, 2024

Choose a reason for hiding this comment

benbc Mar 12, 2024

Choose a reason for hiding this comment

iaindillingham Mar 13, 2024

Choose a reason for hiding this comment

benbc commented Mar 6, 2024 •

edited

Loading