"Memory leak" in classic mirrored queue slave with overflow: reject-publish #9905

gomoripeti · 2023-11-10T23:27:34Z

Describe the bug

Firstly I am aware that mirrored classic queues are about to be removed. Opening this issue mostly to document the observed behaviour.

Given a mirrored classic queue with max-length[-bytes] and overflow: reject-publish. When the queue is full and a client with a long-lived connection and publisher-confirms disabled, publishes messages to this queue, the memory of the slave processes grow continuously eventually leading to an OOM. (The memory is released when the connection is closed)

What happens is that the channel process sends published messages to both the master and slave processes of the queue, and the slaves temporarily store them in the sender_queues field of their state (maybe_enqueue_message). When the queue is full and publisher-confirms are enabled the master also broadcasts a discard message to the slaves (in send_reject_publish) which removes the message from the sender_queue (in publish_or_discard). However if publisher-confirms are disabled it does not send anything to the slaves (in send_reject_publish), so the sender_queues structure is growing indefinitely.

We speculate that the issue also exists if the messages are published to the mirrored queue via dead-lettering.

Reproduction steps

Create a multi-node cluster for example on 3.12.6 (I tested on main 09a95a5)
Create a policy for all classic-queues with ha-mode: all
Create a classic queue with max-length: 10 and overflow: reject-publish, leader node being rabbit-1
Open an AMQP connection and publish messages to the queue continuously (without enabling publisher-confirms). The queue will have 10 messages. The memory on rabbit-1 and the memory of the queue master process remains stable. However the memory on rabbit-2 and rabbit-3 and the process memory of the queue slave processes will continually grow.
...

Expected behavior

The memory of the queue slave processes remains stable.

Additional context

Because the variable_queue:discard is a noop, apart from mirrored classic queues this issue of a missing discard call probably does not affect any of the queues included in RabbitMQ.
However it might affect queue types provided by community plugins that are based on the classic queue (non-mirrored). As the example of the message deduplication plugin shows there might be some plugins that make use of the discard callback. (noxdafox/rabbitmq-message-deduplication#96) Hence I think it is worth considering including a fix (which I'm willing to submit)

The text was updated successfully, but these errors were encountered:

Otherwise the message is never removed from the state of mirrored queue slaves. Fixes rabbitmq#9905

michaelklishin · 2023-11-11T17:07:32Z

#9815 will address it once and for all.

reject-publish without publisher confirms does not make much sense. A workaround could be a way to forcefully enable publisher confirms for a virtual host, or even all AMQP 0-9-1 channels on a node.

Otherwise the message is never removed from the state of mirrored queue slaves. Fixes rabbitmq#9905

gomoripeti added the bug label Nov 10, 2023

gomoripeti added a commit to cloudamqp/rabbitmq-server that referenced this issue Nov 10, 2023

Classic queue: discard rejected msg even when publisher-confirm disabled

7b7ff7e

Otherwise the message is never removed from the state of mirrored queue slaves. Fixes rabbitmq#9905

michaelklishin closed this as completed Nov 11, 2023

michaelklishin added the wontfix label Nov 11, 2023

luos pushed a commit to esl/rabbitmq-server that referenced this issue Oct 30, 2024

Classic queue: discard rejected msg even when publisher-confirm disabled

b534e0d

Otherwise the message is never removed from the state of mirrored queue slaves. Fixes rabbitmq#9905

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Memory leak" in classic mirrored queue slave with overflow: reject-publish #9905

"Memory leak" in classic mirrored queue slave with overflow: reject-publish #9905

gomoripeti commented Nov 10, 2023

michaelklishin commented Nov 11, 2023 •

edited

Loading

"Memory leak" in classic mirrored queue slave with overflow: reject-publish #9905

"Memory leak" in classic mirrored queue slave with overflow: reject-publish #9905

Comments

gomoripeti commented Nov 10, 2023

Describe the bug

Reproduction steps

Expected behavior

Additional context

michaelklishin commented Nov 11, 2023 • edited Loading

michaelklishin commented Nov 11, 2023 •

edited

Loading