-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
ci: run self-hosted e2e test #95636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
ci: run self-hosted e2e test #95636
Conversation
This should be here, or else we will be having a hard time tracking which commit breaks self-hosted. Like what happened today.
@hubertdeng123 hey can you take a look at this? |
we had previously decided it was not worth the time here and that it was acceptable for dev-infra to triage the failures when they happen (rarely) -- as it's only happened once in the last ~year or so I think the trade off is probably still correctly chosen (especially because this suite takes a very very long time and has historically been quite flaky leading to developer disruption) |
@asottile-sentry As far as I know, it happened twice this year, for 25.6.0 and 25.7.0. The last one is fixed by this revert commit |
still doesn't seem worth it |
Why? It doesn't block the release pipeline for SaaS, nor is it a required check for merging to the master branch. I don't see why not. |
(1) people will still wait for it |
That's kind of the point? :) The typical run time for self-hosted e2e tests is under 10 minutes for quite a while so "takes a long time" is not a valid arguent for some time now: https://github.com/getsentry/self-hosted/actions/workflows/test.yml?query=branch%3Amaster
It does not flake anymore either. You can refer to the link above. That means if it fails it has a strong signal. I don't think treating these any different from a regular test makes sense: any test being flaky is bad and they should have high SNR
Not if you make them required. And even if it gets ignored, it still makes it way faster than to identify the suspect commit rather than trying to bisect it, especially across multiple repos. |
10 minutes plus the image build would also make it the slowest job we have if there's actual things that are failing those should become tests in the sentry repo not some bespoke external test system
this is a tiny fraction of the workload that this build would be subjected to -- this PR adds it for every sentry PR run and mainline change I also see a bunch of failures on the link you provided -- master should never be red
well that kills the whole proposal of adding it not-required state? |
I'd agree with you here, but it's also quite a bit of work to get us to that point unfortunately. This is however the easiest way for us to identify breaking commits for self-hosted.
The failures recently have been actual failures that we have needed to investigate. The argument that they have been flaky in the past is reasonable, but this has since been much more stable from before.
It would be useful to those concerning themselves investigating the e2e test failures. People may ignore them, but still overall useful. Recently @aldy505, @BYK and I have spent a considerable time during releases to try and track down errors and fix them, given that we're working across timezones. That being said, something needs to change as this burden of ensuring self-hosted is healthy shouldn't just fall on us three, given that @BYK shouldn't be involved at all. Self-hosted technically belongs to dev infra. I'm not completely confident that we can ensure that self-hosted e2e tests are not flaky across the many runs that sentry goes through each day. It's magnitudes larger than snuba/relay/vroom/etc runs. If this gets introduced I'd only be comfortable if it's a non required check. |
I don't really have a horse in this race as I'm not really responsible for any failures regarding self-hosted etc. What I want here is a good and accurate discussion to come to a conclusion and that's why I feel the need to jump in like I did last time as your arguments felt like FUD or just projecting past issues to the present without checking the current state of things. And now it feels like moving the goal post to me:
That might be true (that said I'm not sure as I noticed 18m+ builds for backend tests) but we already build the images and have to build the images. We are talking about the marginal impact of adding these new tests which is less than 10 minutes per commit. Adding the image build time to that does not yield to an accurate relative assesment.
I invite you to try covering these cases as an interested and responsible party for ensuring self-hosted stability as that would indeed be more efficient. So far we have not been able to do this proactively.
If you can find a better way to ensure that self-hosted sentry works as expected with all third-party services, I'd also welcome this rather than maintaining a custom test suite to ensure its workings. In the past, I've seen people experiment with things like Playwright and give up due to stability and performance issues -- I still think that is a better way forward in terms of maintainability but you yourself emphasized the importance of performance and reliability in your earlier messages.
The workload/time we compare are very expensive and disruptive engineering time vs dirt cheap CI CPU time. Again, not an accurate measure. This is especially true if the check will not be required/blocking or if we rely on auto-merge.
Those failures on master happened precisely because other repos did not embrace these e2e tests or ignored their failures. "master should never be red" also applies to self-hosted. Self-hosted is an equal part of the sentry offering, not a "lesser" repo where it is acceptable to break it citing various productivity concerns. |
This is not valid. Creating tests solely within sentry repo doesn't guarantee everything will magically works on self-hosted. The integration tests validates whether event ingestion & event query works, and it's a combination of putting so many moving parts on Sentry together. A change that works fine on SaaS can totally break something on self-hosted, and we won't even know until it's too late. The 25.7.0 release is the one of them. Since self-hosted has different features enabled, we end up wasting a lot of time fixing these issues just to get a release out. @asottile-sentry @hubertdeng123 would you reconsider this PR? Troubleshooting failed self-hosted releases is currently a significant pain point, and adding self-hosted end-to-end tests would greatly improve this. |
@aldy505 Talked about this within ourselves and came up with an alternative approach. Here's a list of concerns we want to address:
Even if this is introduced as a non-required check, it still offers some disturbance for developers. Most developers will be confused by the check, as they're still allowed to merge it (and will likely do so). New Proposal: Core Approach:
This will help us pinpoint the exact commit that introduces a problem for self-hosted. It also introduces zero developer disruption, as no additional CI checks or workflow changes for developers are added. Yes, this is additional overhead in terms of effort, but I'm not very confident that our test won't flake when it's run hundreds of times a day. |
This is what we do already (every 8 hours), right? So the core idea here is to increase the frequency of these and then auto bisect? And for auto-bisect to be specific to repo, it needs to be on the Sentry repo? |
Yes, that is correct. We don't necessarily need to increase the frequency of those, but it'll allow us to pinpoint failures in the sentry repo. |
I'm a bit pessimistic about the auto-bisect thing. Other than that, I'm okay with changing this PR to run as a scheduled workflow with a very tight interval (every 1 hour on EU & US weekdays). |
@aldy505 Why are you pessimistic on this workflow? |
@hubertdeng123 Let's go with this one for now |
This pull request has gone three weeks without activity. In another week, I will close it. But! If you comment or otherwise update it, I will reset the clock, and if you add the label "A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀 |
bump just to remove the stale bot |
@hubertdeng123 Can you revisit this? |
This should be here, or else we will be having a hard time tracking which commit breaks self-hosted. Like what happened today.