Fix ParallelIterable deadlock #11781

sopel39 · 2024-12-13T15:00:47Z

It was observed that with high concurrency/high workload scenario cluster deadlocks due to manifest readers waiting for connection from S3 pool.

Specifically, ManifestGroup#plan will create ManifestReader per every ParallelIterable.Task. These readers will effectively hold onto S3 connection from the pool. When ParallelIterable queue is full, Task will be tabled for later use.

Consider scenario:
S3 connection pool size=1
approximateMaxQueueSize=1
workerPoolSize=1

ParallelIterable1: starts TaskP1
ParallelIterable1: TaskP1 produces result, queue gets full, TaskP1 is put on hold (holds S3 connection)
ParallelIterable2: starts TaskP2, TaskP2 is scheduled on workerPool but is blocked on S3 connection pool
ParallelIterable1: result gets consumed, TaskP1 is scheduled again
ParallelIterable1: TaskP1 waits for workerPool to be free, but TaskP2 is waiting for TaskP1 to release connection

The fix make sure Task is finished once it's started. This way limited resources like connection pool are not put on hold. Queue size might exceed strict limits, but it should still be bounded.

Fixes #11768

sopel39 · 2024-12-13T15:01:55Z

cc @findepi @RussellSpitzer @osscm

core/src/main/java/org/apache/iceberg/util/ParallelIterable.java

It was observed that with high concurrency/high workload scenario cluster deadlocks due to manifest readers waiting for connection from S3 pool. Specifically, ManifestGroup#plan will create ManifestReader per every ParallelIterable.Task. These readers will effectively hold onto S3 connection from the pool. When ParallelIterable queue is full, Task will be tabled for later use. Consider scenario: S3 connection pool size=1 approximateMaxQueueSize=1 workerPoolSize=1 ParallelIterable1: starts TaskP1 ParallelIterable1: TaskP1 produces result, queue gets full, TaskP1 is put on hold (holds S3 connection) ParallelIterable2: starts TaskP2, TaskP2 is scheduled on workerPool but is blocked on S3 connection pool ParallelIterable1: result gets consumed, TaskP1 is scheduled again ParallelIterable1: TaskP1 waits for workerPool to be free, but TaskP2 is waiting for TaskP1 to release connection The fix make sure Task is finished once it's started. This way limited resources like connection pool are not put on hold. Queue size might exceed strict limits, but it should still be bounded. Fixes apache#11768

RussellSpitzer · 2024-12-13T20:48:35Z

core/src/main/java/org/apache/iceberg/util/ParallelIterable.java

        if (iterator == null) {
          iterator = input.iterator();
        }

        while (iterator.hasNext()) {
-          if (queue.size() >= approximateMaxQueueSize) {


Isn't this a pretty significant change in behavior or ParallelIterable? I assume we have some performance implications of not fully loading the queue?

The queue should still be bounded albeit not strictly up to approximateMaxQueueSize.

I assume we have some performance implications of not fully loading the queue?

What do you mean by not fully loading the queue? This PR guarantees that Task that is started will complete and not hold external resources.

github-actions bot added the core label Dec 13, 2024

osscm reviewed Dec 13, 2024

View reviewed changes

core/src/main/java/org/apache/iceberg/util/ParallelIterable.java Show resolved Hide resolved

sopel39 force-pushed the ks/parallel_fix branch from ec139c4 to da723f0 Compare December 13, 2024 20:40

RussellSpitzer reviewed Dec 13, 2024

View reviewed changes

Do not submit a task when there is no space in queue

00476be

Fokko requested a review from findepi December 16, 2024 17:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ParallelIterable deadlock #11781

Fix ParallelIterable deadlock #11781

sopel39 commented Dec 13, 2024

sopel39 commented Dec 13, 2024

RussellSpitzer Dec 13, 2024

sopel39 Dec 13, 2024

Fix ParallelIterable deadlock #11781

Are you sure you want to change the base?

Fix ParallelIterable deadlock #11781

Conversation

sopel39 commented Dec 13, 2024

sopel39 commented Dec 13, 2024

RussellSpitzer Dec 13, 2024

Choose a reason for hiding this comment

sopel39 Dec 13, 2024

Choose a reason for hiding this comment