-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding TTL to proxied messages #276
base: master
Are you sure you want to change the base?
Conversation
A message proxied to another node can be proxied again. This is clearly evident when picking `Horde.UniformRandomDistribution` as a distribution strategy, especially in combination with a large number of nodes. This commit contains an implementaiton which limits the amount of times a message can be proxied before expiring, by adding a TTL which works similar to the TTL on IP packets.
As indicated to me by a colleague after a code review, the implementation will cause issues when upgrading from the current version to the new version. The tuple size for :proxy_operation changed from 3 to 4, which means nodes with old and new versions won't be able to proxy to eachother. This commit mitigates that to a degree. The new implementation supports both tuple size 3 and 4. And when :proxy_message_ttl is set to :infinity (the default), `proxy_to_node` will use tuple size 3. This means that upgrade is possible if during the upgrade `:proxy_message_ttl` option is not set, or set to :infinity. If it is set to an integer value, the upgrade will cause proxy messeage from upgraded nodes to old nodes to fail. The CHANGELOG would need to reflect this as an upgrade risk.
I've pushed a new commit. The pushed changes should help fix an issue upgrading to this new version. |
Yes, please do add an entry to the changelog for this. |
The approach you are taking in this PR is to send an error when TTL goes to 0. But could we also just start the process when TTL is 0? What do you think? |
That sounds like a great idea. I've avoided it so far as it means that So if you're OK with proxy_to_node being message specific, I can build in an exception for |
Pushed an update which changes the On a another note, I am considering returning an error when starting a Horde.DynamicSupervisor with a :proxy_message_ttl set to zero, as obviously no one should ever set it to zero, and it would cause issues for There is also still the issue that :proxy_message_ttl defaults to :infinity, which allows for upgrading seamlessly from a previous version to this version. But at what point would the :proxy_message_ttl default to something else? Or should the README indicate that setting the proxy TTL (after upgrade) is recommended? With TTL of 2:
graph TD;
pa[process A] -- call start_child() --> A[node A];
A -- proxy start_child(ttl: 2) --> B[node B];
B -- proxy start_child(ttl: 1) --> C[node C];
C -- TTL expired, perform add_child and reply {:ok, pid} --> pa;
pb[process B] -- call terminate_child() --> Ab[node A];
Ab -- proxy terminiate_child(ttl: 2) --> Bb[node B];
Bb -- reply :ok --> pb;
|
If we document it in the documentation and in the changelog, that should be enough. I'll emphasise it when doing the usual tweets so that hopefully more people get the message. But maybe it's also something most people will only discover if they also have the issue in question. |
I'll write something about the option in the readme. In normal cases the TTL should not be necessary, but there are some cases which are improved by this. Most notably:
|
A message proxied to another node by the DynamicSupervisor can be proxied again. This is clearly evident when using a large number of nodes and picking
Horde.UniformRandomDistribution
as a distribution strategy. Pushing your luck with that strategy 🍀This commit contains an implementation which limits the amount of times a message can be proxied before expiring, by adding a TTL which works similar to the TTL on IP packets.
The default TTL is
:infinity
, which means the implementation is backwards compatible. Message with TTL :infinity can bounce around between nodes forever. The max TTL can be set to any integer via the new:proxy_message_ttl
option. Each "hop" decreases the TTL for a message by 1 one. When a message with a TTL of zero needs to be proxied, an error will be returned to thereply_to
process.With a distribution strategy of
Horde.UniformDistribution
the issue of passing messages around forever is unlikely, as nodes tend to agree on the outcome ofchoose_node
. However, a recent incident whilst upgrading to OTP 27 brought this issue to light, as the underlying algorithm for choosing nodes was broken on our already upgraded nodes. This causes message to be proxied between our nodes infinitely.Please feel free to pass any form of judgement on the implementation. This is just a jumping-off platform and it can only go uphill from here 👍
With TTL of 2: