You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When upgrading a multi-node cluster from 3.11.10 to 3.12.0-beta.6 I see the below crash.
** Generic server <0.1683.0> terminating
** Last message in was {'$gen_cast',
{gm,{discard,<15210.8329.0>,flow,
<<244,215,79,6,212,217,23,44,46,232,176,
225,205,41,240,91>>}}}
...
** Reason for termination ==
** {function_clause,
[{rabbit_mirror_queue_slave,process_instruction,
[{discard,<15210.8329.0>,flow,
<<244,215,79,6,212,217,23,44,46,232,176,225,205,41,240,91>>},
{state,
...
I think it is related to the following PR #7802. In case of a rolling upgrade the queue master is on the old version while the slave is on the new version.
Unfortunately I don't fully understand when the discard message is sent and whether our load-test tool uses auto-ack.
Reproduction steps
Create a 3-node cluster with RabbitMQ 3.11.10
Create a mirrored classic queue with master on node-02 and slave on node-01
Generate continuous moderate traffic (I used ~0.25 msg/sec both publish and consume rate and reconnecting clients)
Upgrade node-01 to 3.12.0-beta.6
...
Expected behavior
No crash in case of rolling upgrade to 3.12.0 in presence of mirrored classic queues.
Additional context
If I understand correctly #7802 changed the format of a message passed across nodes. That could be addressed by a feature flag. But maybe simpler to just handle the old format as well for the time being.
The text was updated successfully, but these errors were encountered:
michaelklishin
changed the title
Mirrored CQ slave crash during rolling upgrade to 3.12.0-beta.6
Classic queue mirror runs into a function_clause during a rolling upgrade to 3.12.0-beta.6
Apr 13, 2023
This is not something that's easy to hide behind a feature flag, the message (or previously, message ID) is passed along multiple methods in several modules.
thanks I will try to test it in a few days. Im sure the revert eliminates the crash.
Im sorry to see the change reverted, I was thinking that the follower could be made backwards compatible (and in my case I've seen the follower being on the new version and the leader on old) but I guess in an absurd edge case (eg if a queue is declared while the cluster is already in a mixed version state) it is possible that the leader has the new version and follower the old, in which case the follower cannot be forward compatible.
Describe the bug
When upgrading a multi-node cluster from 3.11.10 to 3.12.0-beta.6 I see the below crash.
I think it is related to the following PR #7802. In case of a rolling upgrade the queue master is on the old version while the slave is on the new version.
Unfortunately I don't fully understand when the discard message is sent and whether our load-test tool uses auto-ack.
Reproduction steps
...
Expected behavior
No crash in case of rolling upgrade to 3.12.0 in presence of mirrored classic queues.
Additional context
If I understand correctly #7802 changed the format of a message passed across nodes. That could be addressed by a feature flag. But maybe simpler to just handle the old format as well for the time being.
The text was updated successfully, but these errors were encountered: