Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Memory leak" in classic mirrored queue slave with overflow: reject-publish #9905

Closed
gomoripeti opened this issue Nov 10, 2023 · 1 comment
Closed

Comments

@gomoripeti
Copy link
Contributor

Describe the bug

Firstly I am aware that mirrored classic queues are about to be removed. Opening this issue mostly to document the observed behaviour.

Given a mirrored classic queue with max-length[-bytes] and overflow: reject-publish. When the queue is full and a client with a long-lived connection and publisher-confirms disabled, publishes messages to this queue, the memory of the slave processes grow continuously eventually leading to an OOM. (The memory is released when the connection is closed)

What happens is that the channel process sends published messages to both the master and slave processes of the queue, and the slaves temporarily store them in the sender_queues field of their state (maybe_enqueue_message). When the queue is full and publisher-confirms are enabled the master also broadcasts a discard message to the slaves (in send_reject_publish) which removes the message from the sender_queue (in publish_or_discard). However if publisher-confirms are disabled it does not send anything to the slaves (in send_reject_publish), so the sender_queues structure is growing indefinitely.

We speculate that the issue also exists if the messages are published to the mirrored queue via dead-lettering.

Reproduction steps

  1. Create a multi-node cluster for example on 3.12.6 (I tested on main 09a95a5)
  2. Create a policy for all classic-queues with ha-mode: all
  3. Create a classic queue with max-length: 10 and overflow: reject-publish, leader node being rabbit-1
  4. Open an AMQP connection and publish messages to the queue continuously (without enabling publisher-confirms). The queue will have 10 messages. The memory on rabbit-1 and the memory of the queue master process remains stable. However the memory on rabbit-2 and rabbit-3 and the process memory of the queue slave processes will continually grow.
    ...

Expected behavior

The memory of the queue slave processes remains stable.

Additional context

Because the variable_queue:discard is a noop, apart from mirrored classic queues this issue of a missing discard call probably does not affect any of the queues included in RabbitMQ.
However it might affect queue types provided by community plugins that are based on the classic queue (non-mirrored). As the example of the message deduplication plugin shows there might be some plugins that make use of the discard callback. (noxdafox/rabbitmq-message-deduplication#96) Hence I think it is worth considering including a fix (which I'm willing to submit)

@gomoripeti gomoripeti added the bug label Nov 10, 2023
gomoripeti added a commit to cloudamqp/rabbitmq-server that referenced this issue Nov 10, 2023
Otherwise the message is never removed from the state of mirrored queue
slaves.

Fixes rabbitmq#9905
@michaelklishin
Copy link
Member

michaelklishin commented Nov 11, 2023

#9815 will address it once and for all.

reject-publish without publisher confirms does not make much sense. A workaround could be a way to forcefully enable publisher confirms for a virtual host, or even all AMQP 0-9-1 channels on a node.

luos pushed a commit to esl/rabbitmq-server that referenced this issue Oct 30, 2024
Otherwise the message is never removed from the state of mirrored queue
slaves.

Fixes rabbitmq#9905
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants