Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HandlerRegistration.receive does not account for closed event loops #5337

Open
Onkelborg opened this issue Oct 1, 2024 · 11 comments
Open

HandlerRegistration.receive does not account for closed event loops #5337

Onkelborg opened this issue Oct 1, 2024 · 11 comments
Labels
Milestone

Comments

@Onkelborg
Copy link

Onkelborg commented Oct 1, 2024

Version

4.5.10

Context

When sending/publishing messages on the eventbus, and (one of) the receiver verticle is being undeployed, and the receiver verticle runs on a custom thread pool, the publish fails due to the receiving context's thread pool rejecting the operation since it's shutting down.

Do you have a reproducer?

No

Steps to reproduce

  1. Deploy multiple verticles on different (separate) worker pools - each verticle will subscribe to the same eventbus address in order to receive broadcasted messages
  2. Some other part of the system will publish (broadcast) messages to the eventbus address independently of the rest of the system
  3. Undeploy one of the verticles
  4. Since the verticle being undeployed is the only "user" of the dedicated worker pool, that worker pool will be shutdown
  5. Now, there's a race condition: if a message is being published to the "shared" address on the eventbus, it's possible that one of the handler registrations points towards a context where the worker pool (executor) is shutdown, and thus will reject enqueing the operation.
  6. This will in turn possibly not deliver the message to all consumers for a publish, or no consumer at all in case of a regular "round robin" send
  7. And the exception will be bubbled to the caller of the .publish()-method

Extra

Stack trace/Exception

java.util.concurrent.RejectedExecutionException: Task io.vertx.core.impl.TaskQueue$$Lambda$254/0x00000001003ad840@7819b595 rejected from java.util.concurrent.ThreadPoolExecutor@2b8e2d1[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 91]
        at java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(Unknown Source) ~[?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) ~[?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) ~[?:?]
        at io.vertx.core.impl.TaskQueue.execute(TaskQueue.java:149) ~[vertx-core-4.5.10.jar:4.5.10]
        at io.vertx.core.impl.WorkerExecutor.execute(WorkerExecutor.java:60) ~[vertx-core-4.5.10.jar:4.5.10]
        at io.vertx.core.eventbus.impl.HandlerRegistration.receive(HandlerRegistration.java:46) ~[vertx-core-4.5.10.jar:4.5.10]
        at io.vertx.core.eventbus.impl.EventBusImpl.deliverMessageLocally(EventBusImpl.java:375) ~[vertx-core-4.5.10.jar:4.5.10]
        at io.vertx.core.eventbus.impl.EventBusImpl.sendLocally(EventBusImpl.java:341) ~[vertx-core-4.5.10.jar:4.5.10]
        at io.vertx.core.eventbus.impl.EventBusImpl.sendOrPub(EventBusImpl.java:329) ~[vertx-core-4.5.10.jar:4.5.10]
        at io.vertx.core.eventbus.impl.OutboundDeliveryContext.execute(OutboundDeliveryContext.java:109) ~[vertx-core-4.5.10.jar:4.5.10]
        at io.vertx.core.eventbus.impl.DeliveryContextBase.next(DeliveryContextBase.java:80) ~[vertx-core-4.5.10.jar:4.5.10]
        at io.vertx.core.eventbus.impl.OutboundDeliveryContext.next(OutboundDeliveryContext.java:28) ~[vertx-core-4.5.10.jar:4.5.10]
        at io.vertx.core.eventbus.impl.EventBusImpl.sendOrPubInternal(EventBusImpl.java:422) ~[vertx-core-4.5.10.jar:4.5.10]
        at io.vertx.core.eventbus.impl.EventBusImpl.sendOrPubInternal(EventBusImpl.java:429) ~[vertx-core-4.5.10.jar:4.5.10]
        at io.vertx.core.eventbus.impl.EventBusImpl.publish(EventBusImpl.java:164) ~[vertx-core-4.5.10.jar:4.5.10]
        at io.vertx.core.eventbus.impl.EventBusImpl.publish(EventBusImpl.java:159) ~[vertx-core-4.5.10.jar:4.5.10]

Idea for where to change things:

For our use case, we are only concerned about publish-messages, but I do see potential issues for regular messages too

@Onkelborg Onkelborg added the bug label Oct 1, 2024
@himanshumps
Copy link

I am also facing the same issue

@himanshumps
Copy link

For me the issue came out to be: The AbstractVerticle start method was throwing an exception which was not caught

@vietj
Copy link
Member

vietj commented Oct 10, 2024

I think deliver message locally should ensure the message can be enqued, is there an easy reproducer to implement ?

are you using virtual threads ?

@vietj vietj added this to the 4.5.11 milestone Oct 10, 2024
@Onkelborg
Copy link
Author

No, not using virtual threads. No reproducer, but the race condition is IMHO quite apparent:

  1. Registration is read when the message is about to be published on the eventbus
  2. Verticle is being undeployed
  3. Since the verticle runs as a worker on a uniquely named worker pool, the executor is being shutdown
  4. According to the registration, the message shall be processed on the executor, however, the executor is being shutdown and won't accept anything being enqueued on the executor and an exception will be thrown
  5. And unfortunately, that exception is not caught, instead it's being bubbled to the caller of the publish method
  6. The published message may be received by a random selection of the registered consumers, depending on in which order the registrations were processed

Would a PR be appreciated?

@vietj
Copy link
Member

vietj commented Oct 10, 2024

not saying it is not, asking if you have a reproducer to have a test and avoid writing one :-)

@vietj
Copy link
Member

vietj commented Oct 10, 2024

a PR would be appreciated, this subject actually becomes more important with the advent of virtual threads and shadow context (in vertx 5)

@Onkelborg
Copy link
Author

I'll see if I can find some time in near future :)

Anything in particular that I need to know/do when submitting to this project?

(I'm fairly familiar with the project in terms of using it, I moved a really, really large project from Vert.x 2 and Java 8 to Vert.x 4 and Java 11 last year - biggest issue was to "reinvent" a new ModuleClassLoader for isolated class loading and "modules" since that had been dropped from the Vert.x project - but, hey, lots and lots of legacy code.. :) )

@vietj
Copy link
Member

vietj commented Oct 11, 2024

@Onkelborg you should read the contribution guide and that is enough

@Onkelborg
Copy link
Author

(I still have this in the back of my head..)

@vietj vietj modified the milestones: 4.5.11, 4.5.12 Nov 12, 2024
@Onkelborg
Copy link
Author

I'm currently working on it - I think I've got the implementation right, however, I'm looking into trying to cobble together a unit test for it somehow

@Onkelborg
Copy link
Author

Status update: I've managed to write comprehensive unit tests for local message delivery for send and publish, however, I want to add tests for Clusters too since I've done some modifications to EventBus.nextHandle, which is overridden in ClusteredEventBus, and has some changes there too

Onkelborg pushed a commit to Onkelborg/vert.x that referenced this issue Nov 27, 2024
…ly handle if the receiving context's event loop has been shutdown
Onkelborg pushed a commit to Onkelborg/vert.x that referenced this issue Nov 29, 2024
…ly handle if the receiving context's event loop has been shutdown
Onkelborg added a commit to Onkelborg/vert.x that referenced this issue Nov 29, 2024
…ly handle if the receiving context's event loop has been shutdown
Onkelborg added a commit to Onkelborg/vert.x that referenced this issue Nov 29, 2024
…ly handle if the receiving context's event loop has been shutdown
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants