Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change last will to publish to puslar, remove LWT events as they're no #943

Merged
merged 4 commits into from
Jan 10, 2024

Conversation

tsturzl
Copy link
Contributor

@tsturzl tsturzl commented Apr 25, 2023

Fixes #937

Motivation

LWT did not function as expected. LWTs mechanism would utilize the system topic to propagate fireWillMessage events, and then each MoP instance would look at each of it's subscriptions on the topic the LWT should be published on, it would then send the LWT to all these susbcriptions directly. This meant LWT did NOT reach Pulsar topics, nor did QoSPublishHandlers come into play at all for LWT which means that retained LWTs did not work.

Modifications

Instead of using the system topic I changed WillMessageHandler to publish the messages to their destination pulsar topic. This means that both Pulsar and MQTT clients can receive these messages, as MoP already subscribes on the pulsar topic on behalf of the MQTT MoP connection. WillMessageHandler now acquires the QoSPublishHandlers from MQTTService to properly publish messages with the appropriate QoS and retain functionality.

Since sendLWT events are no longer used I removed that from the interface and implementing classes, no uses or implementations should continue to exist. LWTs should NOT need to reach the system topic any longer, as the destination topic is already read by every MoP instance that's concerned with that topic. This is obviously excluding the case where the LWT is retained and that LWT message is send into the system topic as a retained message.

I tried to remain styled as closely to the rest of the codebase as possible. In many cases code is directly copied from other areas. The sendWillMessageToPulsarTopic was named to be very explicit about the behavior, and this code is largely taken from the doPublish method with some modifcations. In this case I'm relying on the Connection object, and therefore fireWillMessage was adapted to return a CompleteableFuture, so we can block till the will message is sent so we don't cleanup the Connection prematurely. I'm not sure if it would make sense to add a timeout on that to prevent deadlocking, but opted for the simpler solution. Additionally to support returning a CompleteableFuture, I changed the delay logic to wrap the SchedulerExecutorService in a way that could be used by a CompleteableFuture.

Verifying this change

  • Make sure that the change passes the CI checks.

This change is already covered by existing tests, such as (please describe tests).

Documentation

Check the box below.

Need to update docs?

  • doc-required

    (If you need help on updating docs, create a doc issue)

  • no-need-doc

This change only adapts existing components to behave as expected.

  • doc

    (If this PR contains doc changes)

@github-actions
Copy link

@tsturzl:Thanks for your contribution. For this PR, do we need to update docs?
(The PR template contains info about doc, which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks)

@github-actions
Copy link

@tsturzl:Thanks for providing doc info!

@tsturzl
Copy link
Contributor Author

tsturzl commented Apr 25, 2023

Is there a good way to setup my IDE for the styling expected for this project?

@Technoboy-
Copy link
Contributor

Is there a good way to setup my IDE for the styling expected for this project?

image

@tsturzl
Copy link
Contributor Author

tsturzl commented Jun 26, 2023

@Technoboy- I'm not sure why the failing test is failing in this case. It seems unrelated to the changes made.

@tsturzl
Copy link
Contributor Author

tsturzl commented Sep 28, 2023

Unless anyone has any feedback or requests, this should be ready to merge. Without this LWT is largely broken right now.

@tsturzl
Copy link
Contributor Author

tsturzl commented Dec 1, 2023

@Technoboy- @mattisonchao Please review. No feedback has been provided. This is a major bug, as it completely breaks the ability for proper MQTT connection tracking.

willMessageHandler.fireWillMessage(clientId, willMessage);
try {
// wait to will message to fire before continuing cleanup
willMessageHandler.fireWillMessage(connection, willMessage).get();
Copy link
Member

@mattisonchao mattisonchao Dec 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should better give a timeout to avoid thread hanging forever in some by fault tolerance purspective. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a 500ms timeout here. I didn't see many examples to pull from, but 500ms seems generous enough to make sure the tasks completes, but should hopefully prevent blocking for too terribly long. I can change this to something else if you have a better duration in mind.

} else {
sendWillMessage(willMessage);
final Executor delayed = delayedExecutor(willMessage.getDelayInterval(), TimeUnit.SECONDS);
return CompletableFuture.supplyAsync(() -> sendWillMessageToPulsarTopic(connection, willMessage).join(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not block the single executor here. It will be the bottleneck in the high traffic load. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched this to use runAsync and no longer await the CompletableFuture returned by sendWillMessageToPulsarTopic. From my understanding this should just free up the ScheduledExecutorService more immediately. I don't believe there is any major caveat to this, however if we really care about the result of that CompletableFuture (which would only really be an exception), I'd say it'd make sense to maybe move this executor to use a cached threadpool with a low keep alive time.

Copy link
Member

@mattisonchao mattisonchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments. :)

Copy link

@tsturzl:Thanks for your contribution. For this PR, do we need to update docs?
(The PR template contains info about doc, which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks)

1 similar comment
Copy link

@tsturzl:Thanks for your contribution. For this PR, do we need to update docs?
(The PR template contains info about doc, which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks)

Copy link

codecov bot commented Dec 29, 2023

Codecov Report

Attention: 12 lines in your changes are missing coverage. Please review.

Comparison is base (1eccdae) 77.28% compared to head (7418633) 75.98%.
Report is 29 commits behind head on master.

❗ Current head 7418633 differs from pull request most recent head 88b0966. Consider uploading reports for the commit 88b0966 to get more accurate results

Files Patch % Lines
...lsar/handlers/mqtt/support/WillMessageHandler.java 76.47% 3 Missing and 1 partial ⚠️
...qtt/support/MQTTBrokerProtocolMethodProcessor.java 50.00% 2 Missing ⚠️
...lsar/handlers/mqtt/support/Qos0PublishHandler.java 0.00% 2 Missing ⚠️
...lers/mqtt/support/event/PulsarEventCenterImpl.java 86.66% 2 Missing ⚠️
...andlers/mqtt/proxy/PulsarServiceLookupHandler.java 75.00% 0 Missing and 1 partial ⚠️
...dlers/mqtt/support/event/AutoSubscribeHandler.java 85.71% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #943      +/-   ##
============================================
- Coverage     77.28%   75.98%   -1.30%     
+ Complexity      988      978      -10     
============================================
  Files           111      111              
  Lines          4336     4414      +78     
  Branches        336      345       +9     
============================================
+ Hits           3351     3354       +3     
- Misses          802      868      +66     
- Partials        183      192       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tsturzl
Copy link
Contributor Author

tsturzl commented Dec 29, 2023

@mattisonchao Hoping to address these today. I appreciate the feed back! If not today I should be able to address this early next week.

@tsturzl
Copy link
Contributor Author

tsturzl commented Jan 9, 2024

Should be ready for re-review

@Technoboy- Technoboy- merged commit 7ac60ae into streamnative:master Jan 10, 2024
43 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LWT does not reach Pulsar topic, short cuts QoSPublishHandlers so Qos nor Retain function properly
3 participants