-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clearing up license confusion (post-mortem) #36
Comments
I didn't "come after one of your repos", I raised an issue that I saw. I'm sorry that this turned into a pile-on, I'm deeply unhappy with how agressive the comments on the GitHub issues got and I need to give that post mortem of my own at some point as there are some lessons for me to learn. In particular I need to be more careful about venting steam on social media when there is already enough attention on an issue. But please don't assume everyone there was acting in bad faith. To pick up on the specific issue in https://github.com/seleniumbase/resource-files, I do know that "putting specific licenses directly into files was OK". Many of the files contain license info, but not all. You may want to review introjs, that is under the AGPL (according to https://github.com/usablica/intro.js/blob/master/license.md) and this will have implications for any downstream users which they should be made aware of. And in regards to "my https://github.com/seleniumbase org falls under the special protection of the Software Freedom Conservancy (due to being part of the Selenium umbrella of frameworks)", if SeleniumBase is part of the Selenium project, it should be listed on https://www.selenium.dev/projects/. You may want to check with SFC as to precisely what services they provide to member projects - they provide advice and can undertake license enforcement activities, but I expect that member projects still need to respond to license compliance issues raised by third parties. |
Also, for the record: I knew nothing of the 4chan thread. That is fucking abhorrant and I'm sorry that you were subject to that. |
@pbrkr SeleniumBase is listed in the Selenium ecosystem on this page: https://www.selenium.dev/ecosystem/ (They told us we get the benefit of the special protection.) As for the files in https://github.com/seleniumbase/resource-files and a few other areas, those are downloaded directly from https://www.jsdelivr.com/ (or most of them at least) and saved in case of an emergency situation where the CDN loses its files or goes down completely. The files aren't directly used at all. The repo doesn't really need to exist. When SeleniumBase loads a special resource on a webpage, it grabs the data from the CDN directly. If the License data is missing from the CDN, then it would probably be a problem there first. I haven't updated https://github.com/seleniumbase/resource-files in a few years, due to not being used at all, and probably won't need to be. Could easily be deleted if its existence proved to be a problem. |
The actual answer is "it depends", and whether or not it's legal doesn't make it the morally correct thing to do. I still urge you to use the recipe I provided to restore the original commit history, it'll make life easier in the long run when you want to track down when / why a decision was made.
Yes, but you can't change from something more restrictive to something more permissive. It depends on the terms. Your Microsoft example is explicitly permitted under the terms of the Apache license, whereas LGPL is strong copyleft and does not permit that.
And you put these companies at significant legal risk if they were using your forked code in breach of the original license. |
Unfortunately, a lot of files are published on CDNs in a non-compliant way, lacking copyright notices and license conditions. That's a frustration, but not one that me or you can fix completely. What I would advise is to keep a record, maybe in a readme file, of the upstream projects, versions and links to licenses for each of the subdirectories in https://github.com/seleniumbase/resource-files and the zip files in https://github.com/seleniumbase/SeleniumBase/tree/master/seleniumbase/extensions. Knowing which versions you're distributing also really helps downstream users in case they need to do any security or license compliance of their own. I still disagree with "For any possible license issues that you may have with SeleniumBase, go directly to the Software Freedom Conservancy" - I consider a licensing issue to be just another bug that needs fixing. I think that project maintainers should handle these issues in the first case, and in your case you may request assistance from SFC under any agreement you have with them if you need it. |
This comment was marked as outdated.
This comment was marked as outdated.
If you are working with Overall you haven't cleared anything up here except that you are doubling down on sloppily rationalizing your bad actions. |
This comment was marked as abuse.
This comment was marked as abuse.
(Note that SFLC isn't the same as SFC, but the point I agree with nonetheless.) |
@mdmintz Closing and deleting comments that much here and in #16 may be seen by some as an abuse of your moderator rights. You can always use GitHub for arbitration:
Source: https://docs.github.com/en/site-policy/github-terms/github-community-guidelines |
The GitHub Community Guidelines are stated clearly: It's up to a moderator's best discretion to decide when it's appropriate to remove comments. If I remove a comment, then it's because I considered that there was sufficient reason to do so. Sometimes it's because there wasn't a good option when hiding a comment instead: There's a game called "Two Truths and a Lie", where people say two things that are true, and then one thing that isn't. Then people have to guess which one is the lie. When people apply this game to the real world, they are able to trick people into thinking that the lie is true because the truths were true. It can be quite dangerous if people don't carefully evaluate each point that was made... often times they'll read through quickly and assume that the entire message is true just because the first few things they read were true. This makes it tricky to handle some comments where I can't selectively mark one area as valid and another area as invalid. (Here's a made-up example that simplifies a similar accusation: As many of you have already seen, 5b7314a#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R7 5b7314a#diff-3fa844e028504aec4be871ebeaffa51082d8b40567cff328b609c374c6d8c44fR9 That's already more than the practice of putting attribution in a https://github.com/microsoft/playwright/blob/3694c1422d9a541776602fb870d0b8e249eff35d/NOTICE Since people are more likely to see the top of a Also, I'm fairly certain that the majority of people using Now, onto a more serious matter... As GitHub Insights had shown, multiple users came over from 4Chan (numbers have grown since the original screenshot posted earlier). There, they made incredibly offensive comments: https://boards.4chan.org/g/thread/101339536, and it gets worse the more you scroll down. To summarize that thread, there were people there who went after me because I'm Jewish. There was also a call-to-action on 4Chan for people to come downvote my posts on "Stay frosty", Python users! |
I didn't question your right to moderate. If you don't want to have GitHub as an intermediary, so be it. I see the fork button as a double-edged sword. On the one hand, I like the network graph and the list of forks. You can sort the list based on stars, updates, open issues and open forks. This helps with finding potential successors of an orphaned repo, while the reference to upstream in your README.md comes short in this regard. As already stated by others, it's up to you whether to make use of the fork button. On the other hand, I don't like the implications that come with the deletion of public repositories. Hopefully, people take note of such a change. I don't know whether such an action is explicitly pointed out by GitHub. The photo itself just confused me. Thus, I clicked on the confused emoji provided by GitHub: In general, I like to keep politics and religion out of coding. Bringing it up doesn't help. |
@duxsco I brought up the photo, #16 (comment), only after I saw on social media / chat forums that many people were coming after me specifically because of my religion (Judaism) and because I support Israel. All those downvotes against me (for that specific photo) helped clarify the situation to bystanders. The post-mortem also helped clear up some confusion. |
I don't think downvotes against that comment were necessarily caused by people being antisemites - it was probably a factor to those that came from places like 4chan, but I think many others simply saw the post as being randomly off-topic and might have interpreted it as, say, a really poor attempt to distract from the issue at hand. I would personally guess that the number of downvotes would not have been significantly lower if the flags in the image had been of e.g. Palestine. |
@mdmintz let's not take the 4chan bait and get distracted onto off-topic political issues. 4chan is entirely trolls, and I'm sorry you suffered abuse by them, but we don't have to follow their lead. Instead, can you explain why you created the repo they way you did? A number of people have asked about this, and you haven't given an answer. Instead of the easy path (click the Fork button), you took a more difficult path that involved copying a subset of files into a new repo. This lost git history and made it more difficult to contribute your changes back in the future. Why would you do that? More baffling to me is why you dropped the test suite in the process? Shouldn't this library have a test suite? This is an unusual way to make a fork, and looks suspicious. Perhaps you have good reasons, but you haven't explained. I know you have said that you are not legally required to keep the git history. That is true. But it's really odd to do it the way you did. Help us understand. |
@nedbat Excellent questions. As you know, I'm in the web automation space, and so I spend a lot of time studying web automation frameworks and repositories. One of the more recent repos I studied was Microsoft's Playwright. I was aware of how history was reset when Playwright was created from Google's Puppeteer. Seemed like a nice way to keep a framework lightweight for the next generation (and if history could be reset like that, then why would there be a double-standard on individual developers doing the same thing?). That is probably how I started with With the repo, (and using Python 3.11), I ran: flake8 --statistics --count This led to 1263 issues, summarized with the following categories by count:
And so began the very slow process of fixing each line, line-by-line, on my own. Unfortunately, the tests that came with the repo were not in good shape. The quick
And of course, there was a lot of major refactoring to fix the remaining 650+ flake8 issues that could be found throughout the code. Very tedious, but eventually I fixed them all! It was quite the mess getting everything organized, and some of it was probably done into the early hours of the morning on some days, (sort of like this response I'm typing right now). Therefore, I probably won't remember all the details, as Keep in mind, OK, getting late here and I need to get some sleep before work in the morning. Hopefully that answered your questions! |
I didn't know the 4chan people had crossed over before reading the post-mortem. IMHO, you played into their hands by posting the photo and added to the confusion the 4chan people were already apparently causing for bystanders like me.
I see such an attempt to allocate haters as an abuse of the comment functionality. It would have been better to get GitHub involved instead. If it had been kitten pictures, for example, I would have downvoted it straight away. But as it seemed to be about politics and religion, I opted for the confused emoji instead. |
It doesn't keep the framework lightweight, since the delivered code is the same. It just makes it hard to understand the history, and hard to contribute back.
Playwright was meant to be a new thing unrelated to the old Puppeteer. You had a different goal: you've said you wanted to contribute your fixes back to nose. That's really difficult now because of the choice you made, so it will likely never happen. Thanks for keeping nose working, but I think it's yours forever now. |
@nedbat, as you mentioned earlier: That, I can agree with. Because of this, I didn't have to take on the challenge of reviving |
Clearing up license confusion (post-mortem)
For those of you who missed the action, a large number of people recently showed up in a flash mob to discuss and/or complain about license issues in
pynose
(and some of my other repos). They mainly came from Reddit and 4Chan. (Source: "GitHub Insights")Some big questions: Were the claims justified? Was I unfairly targeted? Are there other popular repos with the same issues? Let's go through the points that were brought up and see based on some helpful questions:
(Question) Can a repo fork/copy from another repo while removing history? (The part in question: 5b7314a, where
pynose
was created from a modified version ofnose
.)(Answer) At it turns out, yes, that's legal: That's how Microsoft created Playwright from Google's Puppeteer: See microsoft/playwright@9ba375c, which was made from a modified copy of Puppeteer (https://github.com/puppeteer/puppeteer).
(Question) Can a repo change its license to MIT from something else?
(Discussion) The Puppeteer License is Apache: https://github.com/puppeteer/puppeteer/blob/main/LICENSE. However, when Microsoft created Playwright, they changed the original license to MIT: microsoft/playwright@794b59c. Certainly looks legal if Microsoft can do it. Turns out that maybe it wasn't OK because they later changed it back: microsoft/playwright@562e6f5. So even with code reviews and a very large legal team to double-check things, even big companies can get licensing wrong sometimes. If that's the case, then certainly smaller teams (or even individual repo maintainers) may get licensing wrong, or not know correct licensing from wrong licensing if the repos they're learning from didn't get it right either. I ended up "pulling a Microsoft" by setting a license to MIT from a non-MIT license. Got it fixed though: #30. Also fixed a secondary license issue: #34. In the process of that secondary fix, I learned there there was another repo (not mine) that also had a license issue: https://github.com/pdbpp/pdbpp. After pointing it out, someone opened a ticket for it.
(More Discussion) As it turns out, licensing issues are quite common: https://github.blog/2015-03-09-open-source-license-usage-on-github-com/ (may be an old article, but it says only 20% of repos have a license (30% for forked ones) and that "Open source simply isn’t open source without a proper license." So although there was a licensing issue with
pynose
(now fixed), there was a disproportionate response focused at my GitHub. Lots of people disrespected not only me, but also one of the originalnose
maintainers who came to help. (They downvoted him because he thanked me for resurrectingnose
. If you look through the other comments on the thread, anyone who said positive things about me got downvoted.)For some context (as not everyone here knows)
nose
is (or once was) a very popular Python unit-testing framework that hasn't been maintained in over 8 years:Major companies around the world still depend on it. Unfortunately,
nose
stopped working when Python 3.10 came out. Although it was easy to patch it at that point, the number of things that broke withnose
increased at a rapid rate with the releases of Python 3.11 and Python 3.12. People either didn't want to fix it, or didn't know how to fix it. Although I'm quite busy with a lot of other things, I decided to fix it because I knew how to do it. (I've been using Python ever since working at ITA Software, which was acquired by Google.) So I took on that "burden" and createdpynose
. Major companies that were still dependent onnose
began using it. Those companies include big names like Mozilla, Intel, DocuSign, Wikimedia, and SAP:Some of my fixes for
nose
were shipped with Alpine Linux. Eg: (Meaning that they would be found on Azure, Google Cloud, AWS, and Docker instances around the world.) https://github.com/alpinelinux/aports/blob/5fb0b96b79977fd89ee20f1d2bd3367762df67a1/community/py3-nose/python-nose-py312.patchWith people finding out about the popularity of
pynose
, they came by and then called in others from 4Chan, Reddit, Mastodon, etc. While some people did offer constructive criticism, there were many others that either came by to just rant, or to wave torches & pitchforks. The ones with the torches & pitchforks mostly came from https://boards.4chan.org/g/thread/101339536. Some of the people there made comments that were way out-of-line and very offensive. (You can read their long thread and make your own assessments.) There were lots of extremely hostile messages on 4Chan, and calls for people to come downvote mypynose
posts.Some tickets opened in
pynose
were more helpful than others.Eg. This was helpful: #33 (Clear points about licensing rules so that the problems could be described in detail, and fixed accordingly.)
This earlier one was not as helpful: #28 (Fewer details and the mention of preserving history, which as I mentioned earlier with the Microsoft example here: microsoft/playwright@9ba375c shows that preserving the Git commit history of the original repo is not necessary.)
Eventually, I sorted out necessary changes from non-necessary demands by using Microsoft's Playwright repo as a case study. Both
pynose
andplaywright
made similar decisions / mistakes, as posted earlier. (The mistakes that needed to be corrected have already been fixed.)(Question) Was
pynose
not giving credit tonose
?(Answer) The ReadMe clearly stated at the top that "
pynose
is an updated version ofnose
, originally made by Jason Pellerin." Credit was definitely acknowledged and given. (But for some, the ReadMe didn't count because they only cared about the license.)One of the three official maintainers of
nose
spoke positively aboutpynose
fixingnose
and keeping it alive:For reference, here are the three official
nose
maintainers according to PyPI:Let's get back to the "Questions":
(Question) Can a license be slightly modified from the original to include new maintainers for a forked / copied project?
(Answer) Yes, Microsoft added their name when they modified Google's code: microsoft/playwright@9ba375c#diff-0a2cb6528fb78d67f03776f9e443ba3b811ecb8cab767af904e48604197c922b
If that's legal, then I can also add my name when modifying code. (Context: mdmintz/tabcompleter#11, where someone was trying to tell me that I can't do that after returning the original license.)
(Question) Can I put a license for a specific file directly in the file itself, rather than including it in the main LICENSE file?
(Answer) Yes, Microsoft did it: microsoft/playwright@9ba375c#diff-647cd6d72ffd0e5a5e9ba4f459fb9d36bb7b9aa621723e0eb7b221e1d9bc67bcR2 -
Copyright 2017 Google Inc., PhantomJS Authors All rights reserved.
in the file itself. - The main licenses did not include any mention ofPhantomJS
. (Source: https://github.com/microsoft/playwright/blob/71a668eb863ca44e269f8353bfb055d7e0d4e583/LICENSE. It also wasn't in theirThirdPartyNotices.txt
file: https://github.com/microsoft/playwright/blob/71a668eb863ca44e269f8353bfb055d7e0d4e583/packages/playwright/ThirdPartyNotices.txt)Someone came after one of my repos without knowing that putting specific licenses directly into files was OK:
The files were copied directly from their CDN links, which meant that the license would be there if it wasn't missing in the CDN. Here's an example of that:
Therefore, the license would only be missing there if the CDN link didn't include it. (Maybe a CDN issue if the license wasn't uploaded with the JS or CSS code from there.) The JS and CSS file copies would be from there, as well as any SeleniumBase Chrome extension zip files included directly in the repo. Here's another example of the license in the file: https://github.com/seleniumbase/resource-files/blob/main/js/hopscotch/hopscotch.min.js. I deleted a few of his invalid tickets for that (for him not realizing that the license can be included within the files themselves). Hence the reason you might not find the ticket I copied from the email notification I posted above. For fairness sake, I didn't delete other tickets of his when there were valid points, eg: mdmintz/tabcompleter#10. (He did complain later on social media that I deleted a few of his tickets.)
On the topic of SeleniumBase, although the https://github.com/mdmintz org falls under my responsibility, my https://github.com/seleniumbase org falls under the special protection of the Software Freedom Conservancy (due to being part of the Selenium umbrella of frameworks). This means that if anyone has a license issue or any legal issue with a repo in the SeleniumBase org, then they need to go through the Software Freedom Conservancy instead of going directly through me. For regular SeleniumBase issues (non-licensing stuff) you can go directly through me (opening a regular ticket). For any possible license issues that you may have with SeleniumBase, go directly to the Software Freedom Conservancy: https://sfconservancy.org/news/2011/feb/02/selenium-joins/ As written there:
By joining the Conservancy, Selenium obtains the benefits of a formal non-profit organizational structure while keeping the project focused on software development and documentation. Some benefits of joining the Conservancy include the ability to collect donations, hold assets on behalf of the project, and some protection of the lead developers of the project from personal liability when engaging in the activities of the project.
So specifically forSeleniumBase
, they have my back.So in summary, open source license rules can get very complicated: Even big corporations can make mistakes. If a big company does something incorrect with respect to licensing, it's easy for individual developers learning from those repos to make the same mistakes without realizing it. Sometimes, even the people coming to complain about a license issue may get some things wrong (Eg. Them thinking that history from a forked/copied repo needs to be preserved, which clearly isn't the case because this happened: microsoft/playwright@9ba375c, where Google's Puppeteer Git History was removed during the creation of Microsoft's Playwright repo.) Also, some people are more helpful than others in resolving things (by providing useful, actionable feedback). Then there are others out there who are just trying to mess with other people's reputations. The GitHub ecosystem should be a welcoming space for all developers.
For anyone skipping right to the end of this long message, all outstanding requests have been resolved, people are happy with the results, and
pynose
will continue to be shipped with Linux distributions around the world.And now people know me a bit better. In particular, they know I'm the guy who fixes unmaintained Python packages that businesses still depend on. Eg.
pynose
, as well as others likepdbp
(not to be confused withpdb
orpdbpp
). And they know I'm the guy who does a lot with web automation (SeleniumBase
). With all the work I do, one would think that I don't get much of chance to go outside, but I did manage to attend ballroom dance class a few evenings this week, and I recently went to a Star Trek convention where I survived for a whole three days without opening my laptop (https://www.youtube.com/watch?v=BwHc4lIS5z8). There, I partied on the set of the original Enterprise with Jonathon Frakes, and I had a fun conversation with LeVar Burton.OK, back to work, everyone! There's lots of Python code to write!
https://github.com/mdmintz
The text was updated successfully, but these errors were encountered: