Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use teams for ownership #142

Merged
merged 4 commits into from
Mar 13, 2024
Merged

Use teams for ownership #142

merged 4 commits into from
Mar 13, 2024

Conversation

benbc
Copy link
Contributor

@benbc benbc commented Mar 8, 2024

Built on #141.

@benbc benbc marked this pull request as draft March 8, 2024 14:33
This dummy team allows us to record the fact that a small number of
repos are jointly owned by the Tech teams.
@benbc benbc force-pushed the benbc/use-teams-for-ownership branch from a0e2a8c to 62831c8 Compare March 12, 2024 17:18
@benbc benbc marked this pull request as ready for review March 12, 2024 17:26
return [r for r in query.repos(client, org) if _is_tech_owned(r)]
tech_repo_names = list(
flatten(query.team_repos(client, org, team) for team in _TECH_TEAMS)
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iaindillingham flattening nested lists is a case where I think the list comprehension alternative is pretty intolerable. Although here my case is again damaged by the Python stdlib with its iterator-maximalism and terrible naming of standard functional operations (see my alias at the end of the module, which you might reasonably object to).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check my understanding, you prefer the above to...

tech_repo_names = [
    name
    for names in (query.team_repos(client, org, team) for team in _TECH_TEAMS)
    for name in names
]

I'm uneasy about the alias, although it is a better name. No objections, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think the comprehension version is horrible. Once we start splitting list comprehensions over multiple lines I actually prefer for loops. I'd be happy to consider the for loop alternative if you prefer it to flatten(). Or I'd be happy to write flatten() as a helper function, rather than a straight alias, which would allow me to absorb the list() into it.

Copy link
Contributor Author

@benbc benbc Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def tech_repos(client, org):
    tech_repo_names = []
    for team in _TECH_TEAMS:
        tech_repo_names.extend(query.team_repos(client, org, team))

    return [r for r in _active_repos(client, org) if r.name in tech_repo_names]

or

def tech_repos(client, org):
    tech_repo_names = flatten(query.team_repos(client, org, team) for team in _TECH_TEAMS)

    return [r for r in _active_repos(client, org) if r.name in tech_repo_names]

def flatten(xss):
    return [x for xs in xss for x in xs]

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, of course, black won't stand for that and unnecessarily splits the flatten() over multiple lines:

def tech_repos(client, org):
    tech_repo_names = flatten(
        query.team_repos(client, org, team) for team in _TECH_TEAMS
    )

    return [r for r in _active_repos(client, org) if r.name in tech_repo_names]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And my for-loop version had a bug, now fixed above. I think I now prefer the for-loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts are very much "You have bad taste, Iain". To my eye, the nested list comprehension has a superficial appeal: it looks tidy. But looking tidy and being tidy are different, as the mountain of junk behind my sofa testifies. The nested list comprehension is hard to parse (start in the middle and work from right to left, go to the bottom, go to the top?).

I'm drawn to the for loop alternative. I don't think the flatten helper adds much; I'd have to read the implementation to remind myself that it only flattens lists-in-lists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every time I see one of those nested for list comps I have to remind my self that it's equivalent to nested for loops in the same order as the list comp. At that point I think "wouldn't this be fewer lines if it was just nested for loops?" and realize that the answer is "yes".

I can forgive a very simple one like my rewritten flatten(). But once they get split over multiple lines I usually abandon them.

I do like them in principle. Python just isn't a terse enough language to carry it off, I think.

return [r for r in query.repos(client, org) if _is_tech_owned(r)]
tech_repo_names = list(
flatten(query.team_repos(client, org, team) for team in _TECH_TEAMS)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check my understanding, you prefer the above to...

tech_repo_names = [
    name
    for names in (query.team_repos(client, org, team) for team in _TECH_TEAMS)
    for name in names
]

I'm uneasy about the alias, although it is a better name. No objections, though.

…exclusion list

This has a side-effect of slightly changing the behaviour with respect
to archived repos. We don't insist that archived repos have an owning
team recorded. Now that we only display data for repos owned by
_current_ teams, any archived repos that were owned by old teams or
which were active before we had a team structure will not be included in
the data.

This doesn't affect our main use of this data, which is to understand
how the teams are operating today. It potentially affects our long-term
historic view of the data. In practice only 4% of PRs are excluded by
this change and eye-balling the resulting charts over long timescales
shows no significant differences, so we've decided that this is okay.
Previously an unpredictable subset of archived repos were excluded based
on team ownership records (see previous commit for details). For clarity
and consistently we are now excluding them all explicitly.
@benbc benbc force-pushed the benbc/use-teams-for-ownership branch from 62831c8 to 4c6c4d5 Compare March 13, 2024 11:41
@benbc benbc merged commit 6d6f06c into main Mar 13, 2024
9 checks passed
@benbc benbc deleted the benbc/use-teams-for-ownership branch March 13, 2024 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants