-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out how to block or filter out sources of code of conduct violations #77
Comments
Does that work if the repo is part of an organization such as ROpenSci? |
It would flag all those for manual review, which could get heavy-handed. |
A variant would be to maintain a whitelist of approved contributors i.e. those which have contributed at least one package verified in this manner (or manually checked by a moderator). |
I think it may be easier to track code of conduct violations (package and user names) in a confidential repo, then have the bot use the data for pull request reviews in https://github.com/r-multiverse/contributions. |
A blacklist is also fine, but this can still be public. If it's transparent then mistakes can be easily corrected. |
I worry about defacement attacks: an adversary's username could contain hate speech, and I'm not sure I would feel comfortable maintaining a list of such usernames in a public-facing repo. |
In that case, I'd prefer your original idea - and if that requires more manual review for orgs then so be it. Even if it's private, we'd still be storing such material, which doesn't strike me as a good thing. |
Hmmm.... it seems like that would disrupt the contribution workflow and accumulate a large database on the off chance someone behaves maliciously. A banned list would be much shorter, and we would rarely have to update it. Is your main objection the part about the private repo, i.e. archiving a secret store of user data? In that case, I think I would be willing to maintain a public banned list if the list entries were masked just enough to avoid offending or traumatizing casual observers. Maybe base64 encode the whole thing, formatting included, and add obvious documentation warnings to not decode it? Maybe even encode it twice so people get a second warning and an encoded string the first time they try. |
Also, I really hope GitHub can help us, i.e. allowing us to report users as needed. Not sure what their policies and procedures are exactly. |
My concern is not about storing the data per se, but the doing it in secret part. It seems to go against the transparency that all this open infrastructure allows. It also prevents verification that due process is being observed if such non-public lists are maintained. I think I'm onboard with the approach of obfuscation using base64, or perhaps we can think of something better later on. |
Actually, maybe this could work if the GitHub API can tell the difference between orgs and users. For users, we could check if the user is the owner. For orgs, we could check if the user is a member of the org. If this approach works, it seems preferable because
|
Seems possible - with the This is a nice approach, but do you think it would generate too many manual reviews for community contributed packages not from the maintainer e.g. Arrow. |
Maybe. But on the other hand, if a maintainer doesn't want their package in R-multiverse, then I think it's a good idea to make sure we respect their wishes. |
Sure, that's fine. The example I had in mind was rather: I want to add |
That's a good point. Nowadays maintainers just talk to each other when this comes up for CRAN, and we may want to encourage that, but we might still get more manual reviews. Not sure what the right call is here. Looming in the back of my mind are also general questions about security: how do we identify contributors and prepare to hold them accountable if packages have malicious code or illegal files? Discharging that risk will help me think about this. |
I would simply like to add another example. I am effectively the maintainer of https://github.com/pola-rs/r-polars, but I am not a member of the organization https://github.com/pola-rs. |
Yeah, I think at this point I'd be more in favor of a publicly-maintained base64-encoded "banned" list. |
I just found out about GitHub's feature of organization-level moderators. We can add org moderators at https://github.com/organizations/r-multiverse/settings/moderators, and any moderator can go to https://github.com/organizations/r-multiverse/settings/blocked_users and block users. Now that I know about this, I have a new proposal for handling malicious activity at https://github.com/r-multiverse/contributions/pulls:
|
👍🏻 That would be much simpler than what we were thinking about before. |
Awesome! Added https://github.com/orgs/r-multiverse/teams/administrators and https://github.com/orgs/r-multiverse/teams/moderators to https://github.com/organizations/r-multiverse/settings/moderators. That takes care of most of this issue. It only remains to scan https://github.com/RConsortium/r-advisory-database/tree/main/vulns. |
I am wondering if https://github.com/RConsortium/r-advisory-database might be flexible enough to accept reports of copyright violations, etc. |
It occurs to me that we still need an R-multiverse-native way to disable automatic acceptance of packages that previously were removed. Otherwise, if we remove a package due to a policy violation, the bot could just add it back again without anyone knowing. I think we can accomplish this in a simple way using the existing package listing system at https://github.com/r-multiverse/contributions/tree/main/packages. Instead of a URL or JSON, the text file of a problematic package could contain an informative note such as "removed for x policy violation on date y". Then of course a contributor could make a PR to add it again, but since that PR edits an existing file, the bot would flag it for manual review. I added support for this mechanism in wlandau/multiverse.internals@b1e7950. If the forthcoming PR is merged, we should mention it documentation for reviewers. |
Implemented in r-multiverse/multiverse.internals#39, pretty simple and unobtrusive |
Using the GitHub API, it would be simple to check if the author of the PR is the same GitHub user name as the owner of the package source code repo. @shikokuchuo, what do you think?
The text was updated successfully, but these errors were encountered: