-
Notifications
You must be signed in to change notification settings - Fork 822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Potential memory leak #734
Comments
Are you, by any chance, connecting using |
Hmm, I checked, but it does not seem that that is the case. I did some extra checking, and I see that a lot of these byte buffers point to kafka threads, so I guess that I'm at the wrong place. Thanks for you help. |
I think I might have to reopen this issue. It looks like I was pointed in the wrong direction by MAT. When I analyzed the same heap dump with jxray, I got this: Could this be caused by the Moquette library? As I don't know if these sessions gets saved into a local map, or do they get saved in a DirectByteBuffer. I'll try to reproduce the behaviour locally, but thanks for your suggestion. I wouldn't be surprised if it would have to do with the |
Yes. The 4MB size made suspicious about UnsafeQueues, is the exact size of the UnsafeQueues's Segment. |
We are currently running version 0.15. We have tried upgrading to 0.16 once, but we ran into this issue. So I cannot tell you if upgrading to 0.16 would fix this issue unfortunately. |
Ok so in version |
if you extends AbstractInterceptHandler , you shuld release by super.onPublish(msg) ,not call ReferenceCountUtil.release(msg); |
Sure, I also meet this issue(there's no memory leak with emqx): |
@daigangHZ thanks for reporting. Emqx is a different tool developed with different technologies (Erlang and BeamVM), so uses other pattern for memory management which it's hard to correlate with Moquette and JVM. |
This issue can be reproduced. I'm using version 0.18-Snapshot, with the Git commit ending at 23c7b39. I deployed the Moquette broker on a server (2 cores, 4GB RAM) and then started 100 MQTT clients. These 100 clients send messages to the broker at a rate of 5Hz. Another application listens to the messages sent by these 100 clients through the broker. Whenever I run this setup for about 12 hours, the broker process gets killed by the OOM killer (hence, I switched to EMQX to eliminate certain factors). |
Trying the Git HEAD version with my Moquette Load Test project results in clients getting empty messages, so something is not correct! Hmm, can't reproduce it any more... Maybe a case of not cleaning the project and old class files lying around? Wait, I can reproduce it. It happens with clients using QOS 1 or 2, starting from commit 4973627 |
I also used QoS1 messages, so I'll roll back to a previous version and give it a try. |
Turning on trace on the MQTTConnection class shows corruption:
First message is correct, the second one is corrupt. I think that due to the <U+0001> control character in the corrupted message the client ends up with an empty message. |
- Updates the serialization/deserialization of PublishedMessages in H2 and in segmented persistent queues to store and load MQTT properties, so that from a stored message with properties in queue then a PUBLISH messages could be recreated. - Updates the createPublishMessage method factory in MQTTConnection to accept an optional list of MQTT properties to attach to the PUBLISH message. - Updates PostOffice to send retained and non retained PUBLISH messages with subscriptionIdentifier MQTT property stored in the Subcription. - Moved sendPublishQos0 method from MQTTConnection to Session where others sendPublishQos1 and sendPublishQos2 already resides. - Added integration test to proof the publish with subscription identifier on both retained and non retained cases.
The corruption is possibly caused by But that doesn't explain the memory leak. |
Hmm, running tests in the profiler, it seems the H2 MVStore is the largest memory user that keeps growing. There is also an inefficiency in the |
Yes, I have 100 clients sending messages, with one client receiving messages. I've utilized the cleanSession option and configured the automatic reconnection option. The client receiving messages processes them quickly, without any backlog. |
So, do we need to optimize this logic? It seems that this issue may be causing the memory growth. |
We definitely need to look at H2, since that seems to keep growing. The inflightTimeouts queue is bigger than it needs to be, but it doesn't grow. |
I fixed some issues in the segmented queues that caused a memory leak, and improved the memory use of the Sessions. |
Awesome! I'll give it another try. |
After testing for a night I found another small leak: #836. |
Hello,
I don't know if this is the right place to ask, if not, please let me know, I'll transfer this to stackoverflow.
We are using Moquette for our IOT data, and after applying some stateless transformation on the payload we put it in kafka. This all works as expected and is very performant, thank you for that!
However, we've been noticing that there is a memory leak in our service. I'm posting it here, as we also have a similar service who does basically the same, but listens to IOT data over http.
Our memory usage can be seen here:
![image](https://user-images.githubusercontent.com/49156770/222756373-1b61ab62-2776-4a52-9f8e-d788d1e4a058.png)
Where the yellow line is the DirectBuffer part and the red line is the tenured gen.
The memory according to kubernetes (
![image](https://user-images.githubusercontent.com/49156770/222756198-4b15931f-b562-4690-afcd-2421f4ec5195.png)
kubernetes.memory.working_set
) is also steadily increasing, with roughly the same speed (indicating that it is probably the byte buffer).Initially after exploring the heap dump, we didn't find anything strange. We also used jemalloc to track off heap memory usage, but that also is not pointing to anything in particular.
After doing some more reading, I ran this OQL query:
SELECT x AS ByteBuffer, x.capacity AS Capacity, x.limit AS Limit, x.mark AS Mark, x.position AS Position FROM java.nio.DirectByteBuffer x WHERE ((x.capacity > (1024 * 1024)) and (x.cleaner != null))
, using MAT, on the heap dump. The output of this seams to point to a potential issue as we get roughly 400java.nio.DirectByteBuffer
's with a size of4194304
bytes. Am I correct in assuming there are 400 different ByteBufs of roughly 4mB, or am I wrong and there are just a bunch of ByteBufs pointing to the same byte[] underneath and there is no issue at all?See output here:
![image](https://user-images.githubusercontent.com/49156770/222769002-3ac018f1-6ff5-4650-97cb-d2663b9890d6.png)
Does anybody know if we might be using the Moquette library in a wrong way, that could cause such a thing? Because we are not using any direct byte buffers by ourselves, we only use a ByteBuf once when we create a
MqttPublishMessage
, which gets released by theServer#internalPublish
method. If not, what would our next best step be, to figure out what is causing this?We'd greatly appreciate some insight/tips/help, as we've been looking at it for a while, but can't seem to figure out what the cause is.
Thanks in advance!
The text was updated successfully, but these errors were encountered: