The PostCommit Python Examples Flink job is flaky #32794

github-actions · 2024-10-16T03:38:31Z

The PostCommit Python Examples Flink is failing over 50% of the time.
Please visit https://github.com/apache/beam/actions/workflows/beam_PostCommit_Python_Examples_Flink.yml?query=is%3Afailure+branch%3Amaster to see all failed workflow runs.
See also Grafana statistics: http://metrics.beam.apache.org/d/CTYdoxP4z/ga-post-commits-status?orgId=1&viewPanel=8&var-Workflow=PostCommit%20Python%20Examples%20Flink

liferoad · 2024-11-15T21:46:31Z

INFO     apache_beam.utils.subprocess_server:subprocess_server.py:213 Caused by: java.io.IOException: Insufficient number of network buffers: required 16, but only 8 available. The total number of network buffers is currently set to 2048 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.memory.network.fraction', 'taskmanager.memory.network.min', and 'taskmanager.memory.network.max'.
INFO     apache_beam.utils.subprocess_server:subprocess_server.py:213 	at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.internalCreateBufferPool(NetworkBufferPool.java:495)
INFO     apache_beam.utils.subprocess_server:subprocess_server.py:213 	at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:468)
INFO     apache_beam.utils.subprocess_server:subprocess_server.py:213 	at org.apache.flink.runtime.io.network.partition.ResultPartitionFactory.lambda$createBufferPoolFactory$1(ResultPartitionFactory.java:379)
INFO     apache_beam.utils.subprocess_server:subprocess_server.py:213 	at org.apache.flink.runtime.io.network.partition.ResultPartition.setup(ResultPartition.java:158)
INFO     apache_beam.utils.subprocess_server:subprocess_server.py:213 	at org.apache.flink.runtime.taskmanager.Task.setupPartitionsAndGates(Task.java:969)
INFO     apache_beam.utils.subprocess_server:subprocess_server.py:213 	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:658)
INFO     apache_beam.utils.subprocess_server:subprocess_server.py:213 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
INFO     apache_beam.utils.subprocess_server:subprocess_server.py:213 	at java.base/java.lang.Thread.run(Thread.java:829)

github-actions · 2024-11-17T06:38:59Z

Reopening since the workflow is still flaky

liferoad · 2024-11-17T16:35:26Z

No useful logs from the failed workflows.

liferoad · 2024-11-17T16:54:12Z

Looks like some mem issue:

The node was low on resource: memory. Threshold quantity: 100Mi, available: 74432Ki. Container runner was using 58152720Ki, request is 3Gi, has larger consumption of memory. Container docker was using 43668Ki, request is 0, has larger consumption of memory.

Fixes #32794

liferoad · 2024-11-18T15:25:42Z

This keeps failing now here

2024-11-18T14:26:34.5015302Z apache_beam/examples/cookbook/bigquery_tornadoes_it_test.py::BigqueryTornadoesIT::test_bigquery_tornadoes_it

liferoad · 2024-11-18T16:39:53Z

Even with higmem:

The node was low on resource: memory. Threshold quantity: 100Mi, available: 3568Ki. Container docker was using 42360Ki, request is 0, has larger consumption of memory. Container runner was using 59928716Ki, request is 5Gi, has larger consumption of memory.

liferoad · 2024-11-19T14:55:45Z

https://github.com/apache/beam/actions/runs/11915340857/job/33205368335 looks good now when switiching to the higher mem machines.

github-actions · 2025-01-12T03:40:00Z

Reopening since the workflow is still flaky

Amar3tto · 2025-01-13T08:31:53Z

Reason: highmem-runner-22 is not available since January 3.

Amar3tto · 2025-01-13T11:29:59Z

Successful after manually deleting nodepool via console and rerunning terraform which will recreate it.

Amar3tto · 2025-01-16T05:58:22Z

Job has been stable for 3 days. Closing as resolved.

github-actions bot added bug flaky_test P1 workflow_id: 68576195 labels Oct 16, 2024

liferoad self-assigned this Oct 29, 2024

liferoad mentioned this issue Nov 16, 2024

Fix the broken flink tests by optimizing conf #33135

Merged

3 tasks

liferoad closed this as completed in #33135 Nov 16, 2024

github-actions bot added this to the 2.62.0 Release milestone Nov 16, 2024

github-actions bot reopened this Nov 17, 2024

liferoad added a commit that referenced this issue Nov 17, 2024

Use highmem for beam_PostCommit_Python_Examples_Flink.yml

74f9088

Fixes #32794

liferoad mentioned this issue Nov 17, 2024

Use highmem for beam_PostCommit_Python_Examples_Flink.yml #33138

Merged

3 tasks

liferoad closed this as completed in #33138 Nov 18, 2024

liferoad closed this as completed in 030c975 Nov 18, 2024

liferoad reopened this Nov 18, 2024

volatilemolotov mentioned this issue Nov 19, 2024

Highmem22 runners #33162

Merged

3 tasks

liferoad closed this as completed Nov 19, 2024

github-actions bot reopened this Jan 12, 2025

kennknowles modified the milestones: 2.62.0 Release, 2.63.0 Release Jan 13, 2025

Amar3tto closed this as completed Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The PostCommit Python Examples Flink job is flaky #32794

The PostCommit Python Examples Flink job is flaky #32794

github-actions bot commented Oct 16, 2024

liferoad commented Nov 15, 2024

github-actions bot commented Nov 17, 2024

liferoad commented Nov 17, 2024

liferoad commented Nov 17, 2024

liferoad commented Nov 18, 2024

liferoad commented Nov 18, 2024

liferoad commented Nov 19, 2024

github-actions bot commented Jan 12, 2025

Amar3tto commented Jan 13, 2025

Amar3tto commented Jan 13, 2025

Amar3tto commented Jan 16, 2025

The PostCommit Python Examples Flink job is flaky #32794

The PostCommit Python Examples Flink job is flaky #32794

Comments

github-actions bot commented Oct 16, 2024

liferoad commented Nov 15, 2024

github-actions bot commented Nov 17, 2024

liferoad commented Nov 17, 2024

liferoad commented Nov 17, 2024

liferoad commented Nov 18, 2024

liferoad commented Nov 18, 2024

liferoad commented Nov 19, 2024

github-actions bot commented Jan 12, 2025

Amar3tto commented Jan 13, 2025

Amar3tto commented Jan 13, 2025

Amar3tto commented Jan 16, 2025