Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how to block or filter out sources of code of conduct violations #77

Open
wlandau opened this issue Jul 19, 2024 · 23 comments
Open
Assignees
Labels

Comments

@wlandau
Copy link
Member

wlandau commented Jul 19, 2024

Using the GitHub API, it would be simple to check if the author of the PR is the same GitHub user name as the owner of the package source code repo. @shikokuchuo, what do you think?

@shikokuchuo
Copy link
Member

Does that work if the repo is part of an organization such as ROpenSci?

@wlandau
Copy link
Member Author

wlandau commented Jul 19, 2024

It would flag all those for manual review, which could get heavy-handed.

@shikokuchuo
Copy link
Member

shikokuchuo commented Jul 19, 2024

A variant would be to maintain a whitelist of approved contributors i.e. those which have contributed at least one package verified in this manner (or manually checked by a moderator).

@wlandau
Copy link
Member Author

wlandau commented Aug 22, 2024

I think it may be easier to track code of conduct violations (package and user names) in a confidential repo, then have the bot use the data for pull request reviews in https://github.com/r-multiverse/contributions.

@shikokuchuo
Copy link
Member

A blacklist is also fine, but this can still be public. If it's transparent then mistakes can be easily corrected.

@wlandau
Copy link
Member Author

wlandau commented Aug 27, 2024

A blacklist is also fine, but this can still be public. If it's transparent then mistakes can be easily corrected.

I worry about defacement attacks: an adversary's username could contain hate speech, and I'm not sure I would feel comfortable maintaining a list of such usernames in a public-facing repo.

@shikokuchuo
Copy link
Member

In that case, I'd prefer your original idea - and if that requires more manual review for orgs then so be it. Even if it's private, we'd still be storing such material, which doesn't strike me as a good thing.

@wlandau
Copy link
Member Author

wlandau commented Aug 29, 2024

Hmmm.... it seems like that would disrupt the contribution workflow and accumulate a large database on the off chance someone behaves maliciously. A banned list would be much shorter, and we would rarely have to update it.

Is your main objection the part about the private repo, i.e. archiving a secret store of user data? In that case, I think I would be willing to maintain a public banned list if the list entries were masked just enough to avoid offending or traumatizing casual observers. Maybe base64 encode the whole thing, formatting included, and add obvious documentation warnings to not decode it? Maybe even encode it twice so people get a second warning and an encoded string the first time they try.

@wlandau wlandau changed the title Require manual review if the pull request author is not the owner of the package? Figure out how to block or filter out sources of code of conduct violations Aug 29, 2024
@wlandau
Copy link
Member Author

wlandau commented Aug 29, 2024

Also, I really hope GitHub can help us, i.e. allowing us to report users as needed. Not sure what their policies and procedures are exactly.

@shikokuchuo
Copy link
Member

Is your main objection the part about the private repo, i.e. archiving a secret store of user data?

My concern is not about storing the data per se, but the doing it in secret part. It seems to go against the transparency that all this open infrastructure allows. It also prevents verification that due process is being observed if such non-public lists are maintained.

I think I'm onboard with the approach of obfuscation using base64, or perhaps we can think of something better later on.

@wlandau
Copy link
Member Author

wlandau commented Sep 3, 2024

Using the GitHub API, it would be simple to check if the author of the PR is the same GitHub user name as the owner of the package source code repo.

Actually, maybe this could work if the GitHub API can tell the difference between orgs and users. For users, we could check if the user is the owner. For orgs, we could check if the user is a member of the org. If this approach works, it seems preferable because

  1. It is stateless.
  2. It decreases anonymity just a bit.

@shikokuchuo
Copy link
Member

Using the GitHub API, it would be simple to check if the author of the PR is the same GitHub user name as the owner of the package source code repo.

Actually, maybe this could work if the GitHub API can tell the difference between orgs and users. For users, we could check if the user is the owner. For orgs, we could check if the user is a member of the org.

Seems possible - with the /orgs/{org}/repos and /orgs/{org}/members endpoints.

This is a nice approach, but do you think it would generate too many manual reviews for community contributed packages not from the maintainer e.g. Arrow.

@wlandau
Copy link
Member Author

wlandau commented Sep 3, 2024

This is a nice approach, but do you think it would generate too many manual reviews for community contributed packages not from the maintainer e.g. Arrow.

Maybe. But on the other hand, if a maintainer doesn't want their package in R-multiverse, then I think it's a good idea to make sure we respect their wishes.

@shikokuchuo
Copy link
Member

Sure, that's fine. The example I had in mind was rather: I want to add x package to Multiverse for it to build the binaries I need (or so it's within the Production release) - and I am not the maintainer of said package. This would be flagged for review. I'm thinking this could generate too much noise.

@wlandau
Copy link
Member Author

wlandau commented Sep 4, 2024

That's a good point. Nowadays maintainers just talk to each other when this comes up for CRAN, and we may want to encourage that, but we might still get more manual reviews. Not sure what the right call is here.

Looming in the back of my mind are also general questions about security: how do we identify contributors and prepare to hold them accountable if packages have malicious code or illegal files? Discharging that risk will help me think about this.

@eitsupi
Copy link
Member

eitsupi commented Sep 4, 2024

I would simply like to add another example. I am effectively the maintainer of https://github.com/pola-rs/r-polars, but I am not a member of the organization https://github.com/pola-rs.

@wlandau
Copy link
Member Author

wlandau commented Sep 10, 2024

Yeah, I think at this point I'd be more in favor of a publicly-maintained base64-encoded "banned" list.

@wlandau
Copy link
Member Author

wlandau commented Nov 17, 2024

I just found out about GitHub's feature of organization-level moderators. We can add org moderators at https://github.com/organizations/r-multiverse/settings/moderators, and any moderator can go to https://github.com/organizations/r-multiverse/settings/blocked_users and block users.

Now that I know about this, I have a new proposal for handling malicious activity at https://github.com/r-multiverse/contributions/pulls:

  1. Rely on https://github.com/organizations/r-multiverse/settings/blocked_users to block users if needed.
  2. Flag for manual review any package listed in https://github.com/RConsortium/r-advisory-database/tree/main/vulns. (It should be an easy check if we don't bother to search the YAML files for version numbers.)

@wlandau wlandau self-assigned this Nov 17, 2024
@shikokuchuo
Copy link
Member

👍🏻 That would be much simpler than what we were thinking about before.

@wlandau
Copy link
Member Author

wlandau commented Nov 19, 2024

I am wondering if https://github.com/RConsortium/r-advisory-database might be flexible enough to accept reports of copyright violations, etc.

c.f. RConsortium/r-advisory-database#14

@wlandau
Copy link
Member Author

wlandau commented Nov 20, 2024

It occurs to me that we still need an R-multiverse-native way to disable automatic acceptance of packages that previously were removed. Otherwise, if we remove a package due to a policy violation, the bot could just add it back again without anyone knowing.

I think we can accomplish this in a simple way using the existing package listing system at https://github.com/r-multiverse/contributions/tree/main/packages. Instead of a URL or JSON, the text file of a problematic package could contain an informative note such as "removed for x policy violation on date y". Then of course a contributor could make a PR to add it again, but since that PR edits an existing file, the bot would flag it for manual review.

I added support for this mechanism in wlandau/multiverse.internals@b1e7950. If the forthcoming PR is merged, we should mention it documentation for reviewers.

@wlandau
Copy link
Member Author

wlandau commented Nov 20, 2024

I think we can accomplish this in a simple way using the existing package listing system

Implemented in r-multiverse/multiverse.internals#39, pretty simple and unobtrusive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants