-
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automating revert commits #559
Comments
Thanks for opening this issue @encukou. As a friendly side note: allow me to mention that I would have appreciated if you would have reached out to me first so we could work together instead of proposing a counter proposal. This would have reduced the potential frustration of continuing to work on the PR just to do something different then. I think this is important because even if I know there are only good intentions everywhere, it can really impact the energy to contribute. Being said that I am also happy to discuss this here, of course.
Just to clarify: this reasoning it's not only coming from me but this was the feedback that I received after my Language summit 2019 talk and the discussions we had in the past in core development. Indeed, a version of this was originally suggested by @willingc in 2019 and discussed with @ambv and other people in the builedbot watch in the core dev sprint at Microsoft and then again at the core dev sprint at Bloomberg and Google. This was discussed as a solution yet again when we had to manually trigger 4 or 5 different reverts for sub interpreters in ~3.9.
I don't think these two solutions are orthogonal. The automation cannot detect all situations so it's beneficial to ALSO have a way to trigger this as a human. I guess what I am failing to understand is the rationale to view this as "either this or that". In any case, thanks a lot @encukou for caring about this and helping with this issue ❤️ |
Update: I have closed my proposal. |
Yet a third option is enabling merge queue and moving some/all buildbot builds from post-merge to merge queue. This would prevent changes that break buildbots from being merged in the first place, without significantly impacting the usual flow in the common case. |
That has 3 challenges I can envision:
|
This was clearly miscommunication. I don't want to assign blame. Let me explain how I see it, so I can figure out how to do better in the future. I wasn't at the 2019 summit, and wasn't aware of the discussions and your work. I can only blame myself for not paying enough attention, but, your PR was a genuine surprise to me. If I knew you were working on it, I would have reached out! I regret getting in this situation, but I'm having trouble apologizing because I don't know how to make this not happen in the future :(
It's not that. Frustratingly, even if I was more aware of your effort, I'm not sure I'd catch this before the tool was implemented :( |
Thanks for the explanation @encukou! As there is some nuance and in the interest of having a focused discussion on the path forward here, maybe we can chat in discord about how to avoid miscommunication in the future.
I think this exposes an argument of ROI, but the fact that triggers very rarely should not be a problem. People still invest in monitoring systems, debuggers and metrics even if applications have these problems in rare occasions. In that perspective, it becomes a matter of weighting the maintenance cost. And in my opinion is quite low (although I understand that others may have different opinions). The reasons are that:
I can understand if you still that the ROI is very low. But on the other hand when I introduced the automatic messages to PRs when buildbot failed that reduced the failure rate dramatically as you can see here. So not sure what is the right answer here. |
@encukou @pablogsal Since I can't recall what I said in 2019 (lol), I'm trying to understand where there is consensus and where we need to converge on more alignment.
Trying to understand context as we discuss this. |
I apologize for making this look like an “either-or” situation. It's not; cherry-picker/miss-islington reverts could live alongside BuildbotHammer.
Correction: I was there, I got my PyCons mixed up. Apologies for the misinformation.
IMO, we're agreeing on everything but the ROI of a fully automated solution. Please correct me if I'm wrong.
Pablo ran the simulation: last year the tool would have triggered 4 times. IMO, a fully automatic tool can't do much better than this, unless we use heuristics¹. From my experience looking at the failures, many times there are multiple new commits in the breaking build, so classic automation can't pick one. But it's often easy for a human¹ to identify the breaking one -- for example, a docs-only PR probably didn't cause a crash. Hence my focus on a dashboard to show the info to humans, after which I want to work on this issue. ¹ This might be one of the rare cases where AI could be useful, but I assume we don't want a bot to revert people's work without hard evidence. |
Awesome!
I'm going to walkthrough my understanding:
I guess the path taken really boils down to time to maintain, time to implement, and activity level of automated issues/PR. |
As a first iteration towards something like that, BuildbotHammer is starting to sound much better! |
I apologise for the delay in the answer. My last week has been quite challenging :(
That's a lot for chiming in @willingc! ❤️
Notice only the commit that fails the build would be automatically reverted (not all the commits prior to failure). This is because the automatic revert would only trigger if we can identify a single PR as the culprit. The other important thing to clarify: the benefit for an automated process is that it reduces the cost of having to do reverts. When there have been a bunch of commits on top of the commit that breaks the build it makes it much more challenging to make reverts and that process by itself can introduce even more bugs. So addressing this promptly has an advantage of reducing the risk of the operation (if we decide to proceed with it). |
Maybe in the meanwhile I can modify buildbothammer to just post to a new discord channel of high-priority problems when it detects that a builder has been failing and we have a culprit. This channel will be different from "all the buildbot failures channel" because we know that something really is happening and it needs to be addressed as soon as possible. This will allow core devs in residence but also other core devs to act appropiately knowing that everything that goes to that channel is important. |
I think this is a really good idea. (BTW, I would definitely welcome Pablo's buildbot hammer 🔨) |
Sounds great! |
I think is conservative enough that it won't post often. As you mentioned, this has a very low rate of occurring due to the restrictions on single-commit of failure + several failures in a row. |
I love the solutions @encukou and @pablogsal. Here's to better workflows for all 🎉 |
@pablogsal recently opened a PR for automating revert commits.
I agree with the issue but respectfully disagree with that particular proposal.
Copying the reasoning from Pablo's comment:
My own plan of action would be to allow humans to initiate the process:
cherry-picker
to--revert
commits. (Note that in Git, cherry-picks and reverts are very similar once started, see the--continue
option forcherry_pick
andrevert
.)needs-revert
label is added. The GitHub comment should explain that we revert because failing CI can hide other issues.Does that sound reasonable?
(FWIW, my current priority is putting breaking PRs, and individual test failures, on the dashboard -- the first step for a revert or fix is for the failure to show up there.)
The text was updated successfully, but these errors were encountered: