-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Create a BatchOnly category to exclude some batch tests from Dataflow Validates Runner Streaming #36615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… Validates Runner Streaming
89b0901 to
4325e2b
Compare
|
cc: @Amar3tto @tvalentyn for #36577 (comment) |
|
cc: @kennknowles |
|
I would prefer a semantic tag that justifies why this is OK. For example,
But the problem is that the existence of a batch vs streaming mode is actually quite under-the-hood. In fact it would be better for the execution mode to be chosen at finer granularity than "whole job". And if there really is a potentially different codepath in batch and streaming we can't safely disable it. The real solution to the issue is to combine multiple ValidatesRunner pipelines into a single Dataflow job so that we can run 100 at a time or more. This is just sort of hard to do with JUnit. Or at least it requires hackery that I didn't figure out yet. If it is a hack to make just Dataflow work better I would rather make it in the Dataflow build.gradle than tag on the test, since it isn't a property of the test. |
|
|
||
| @Test | ||
| @Category({ValidatesRunner.class, UsesSideInputs.class}) | ||
| @Category({ValidatesRunner.class, UsesSideInputs.class, BatchOnly.class}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since streaming and batch have really different execution models, I would not opt this in to being batch only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, and the name is subject to change. Currently just use if for manual testing
|
|
||
| @Test | ||
| @Category(ValidatesRunner.class) | ||
| @Category({ValidatesRunner.class, BatchOnly.class}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Streaming and batch actually have somewhat different Create implementations I think
|
Another question: How many Dataflow jobs are in parallel? I think at some point we single-threaded these to avoid quota issues, or some other build issue, but we could run with 10+ threads and just need to adjust our quota. |
|
It's still timing out reducing 30+ tests (should save ~1h). It is hard to check if there was a stuck tests without any log. Currently running on #36634, if there are stuck tests, it will show in Dataflow console as long as its in running state. Therefore checking 8h later. |
It was set to default value
Checked in Dataflow console running it's effectively 4 jobs in parallel.
We'd definitely need much larger quota if bump this value. Currently one of the quota most easily hit is in-use ipv4 addresses on us-central1. ipv4 quota is not easy to get approved. I wonder if we can run validates runner tests (and supported Dataflow test in general) on ipv6 only or disable external networks (could be follow up) |
This seems like a good idea. I'm not sure why we would need external networks. Certainly not for all tests. Another possibility could be to shard the test suite for Dataflow. I think probably ParDoTest and CombineTest probably have a ton of pipelines and if you separate some "core ValidatesRunner suite" from "extended ValidatesRunner suite" it might reduce flakes and speed up reruns while reducing quota usage. |
|
I understand what is going on now. The parallelism isn't even. As of now (after 8h+), a single test class ViewTest is running (persumably other tests have finished). There isn't a stuck pipeline. That explains this PR doesn't work in terms of decreasing test running time. ViewTest has 42 validates runner tests. We need to split it as well. However the pipeline running time definitely elevated recently. Checked example pipeline log it appears pipeline teardown time is increased (it usually takes 4h to startup, then needs 2-5 min from "stopping worker pool" to pipeline finish. At "stopping worker pool" the PipelineResult is still running. I kind of remember previously this was 1.5-3 min-ish. I think we're good for now in terms of validation (no SDK side regression found) |
|
СС @tvalentyn |
Dataflow Streaming Validates Runner Streaming tests setup by gathering all
@ValidatesRunnerTestand add--streamingpipeline options. There is a Dataflow Streaming Validates Runner test suites also exercising this test.Over the time the number of test grows so does execution time. We already increased timeout several times in the past. Here I try to dedup some test cases that
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.