-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Memory usage steady growing over time #597
Comments
and today :
If you need more information, please ask ;) |
Hi @Thorsieger |
And looks like big "max-metrics-globbed" is main grow driver too. |
Another thing which I suspect is carbonapi_v3_pb. carbonapi_v3_pb support in go-carbon is quite raw and I'm not sure if bug-free. We're still using carbonapi_v2_pb in prod. |
I'm actually using carbonapi v0.16.1 with I'm not aware of differences between this two protocols, are they drop-in replacement ? |
Auto means v3 if supported, otherwise v2. V3 supports some metadata and different requests, like Multiglob, which very likely causing that leak - as you can see major usage coming from glob function and v3 unmarshal. |
I tried to replicate the issue in a staging environment (only ~ 40k metrics/mins) without success. Replaying similar requests on this env did not result in the same problem. As API V2 and V3 do not support the same requests, I will not be able to test directly in production. If my customers are using Multiglob functions, I cannot break this way of operating for debugging purpose. Is their anything else I can do to help pinpoint the source of the problem ? |
As API V2 and V3 do not support the same requests, I will not be able to
test directly in production. If my customers are using Multiglob functions,
I cannot break this way of operating for debugging purpose.
What do you mean? There should be no difference in functionality with
go-carbon and carbonapi in use of v2 or v3
Вт, 30 июля 2024 г. в 16:54, Thorsieger ***@***.***>:
… I tried to replicate the issue in a staging environment (only ~ 40k
metrics/mins) without success. Replaying similar requests on this env did
not result in the same problem.
As API V2 and V3 do not support the same requests, I will not be able to
test directly in production. If my customers are using Multiglob functions,
I cannot break this way of operating for debugging purpose.
Is their anything else I can do to help pinpoint the source of the problem
?
—
Reply to this email directly, view it on GitHub
<#597 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJLTVT3CSHP4UEKK5XLGO3ZO6SJRAVCNFSM6AAAAABKFAQ5RSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJYGU2TEMRWGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
i can run some test, but currently we reverted to go-carbon 0.15.6 with carbonapi 0.16.0~1 we have the same memory leak issue with go-carbon 0.17.3, but when i reverted to 0.15.6 it no longer leaks should both of them go through carbonapi_v3_pb as well? wondering why go-carbon 0.15.6 is not leaking with carbonapi_v3_pb, or you mean when it compiled with |
@cxfcxf : well, if it's not leaking with 0.15.6 but leaking on 0.17.3 with same carbonapi and using v3 then my hypothesis is wrong and it's code change in go-carbon instead. |
Hello, I try to downgrade to 0.17.1 (cannot got further because it contain a bugfix I need) and the memory leak is still present. I hope that help. That's still 278 commit and 1014 files changes 😬. |
Hi Thorsieger,
Thanks for an attempt but I doubt it help. I suspecting that issue lies in
carbonapi v3 implementation. Did you try graphiteapi with explicit v2
protocol instead?
Пн, 2 сент. 2024 г. в 09:23, Thorsieger ***@***.***>:
… Hello, I try to downgrade to 0.17.1 (cannot got further because it contain
a bugfix I need) and the memory leak is still present.
I hope that help. That's still 278 commit and 1014 files changes 😬.
—
Reply to this email directly, view it on GitHub
<#597 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJLTVVA5AE7WR2ZK4HW7E3ZUQG6FAVCNFSM6AAAAABKFAQ5RSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRTHE4TQOJTGM>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hi deniszh, I have setup yesterday graphiteapi using backend protocol here the first result for memory usage : pprof just after restart :
pprof now (~ 16 hours after startup) :
memory usage seems to still grow over time :/ |
if that help, the last pprof from today :
|
@Thorsieger : thanks for your data. Unfortunately, I still have no ideas why it's behaves like this in your case but not in our:
|
The main difference is maybe that we have more api request than you ? We have ~ 9k requests/minutes pic each day I found that some of our clients are making globed render requests. Example :
Maybe it has a link to |
Server (from which I posted yesterday stats) have 1.5K requests per second, so around 90K/min. But we're using own fork of carbonapi, not sure how that could be related to getExpandedGlobs call.
That's absolutely normal behaviour of carbonapi, getExpandedGlobs doing exactly that. The question why it holds memory and not releasing it back. :( |
I there anything else I can provide you to help to find out ? |
@Thorsieger : I'm afraid not. :( |
Hello, I may have found the bad commit : 676cb0e#diff-9f4dfc723a0b5109df1e60ae2ae68dd14bc00ee1fb515362f9bcc63e81d81bbfR261 It has been merged en v0.17.0 and relate to expandedGlobsCache. It states that there is no memory limit for the cache and that it can't be disabled. Can you look a way to disable globscache maybe ? Or if you find any problem with this code ? Additionally we found that there is no cleanup call for expandedGlobsCache or find request , it only exists for query cache : go-carbon/carbonserver/carbonserver.go Line 1927 in a5c9c55
Maybe adding this cleanup for other caches could help ? |
Yes, disabling it probably would not work, but adding cleanup and configurable size limit should do work. |
Should be fixed in v0.18.0 |
After 3 days running v0.18.0 we have this memory usage : pprof at start:
pprof now :
it's quite convincing that the memory is correctly released Thank's a lot for the support ! |
Describe the bug
I am experiencing a slow but steady memory leak which forces a service restart every week or so.
Logs
Memory usage over time on the physical server :
pprof (on one instance) :
Go-carbon Configuration:
Metric retention and aggregation schemas
N/A
Simplified query (if applicable)
N/A
Additional context
I have a graphite infrastructure that handle 2.4M metrics/minutes. The storage part is composed of 4 go-carbon instances behind a carbon-c-relay. This 4 storages nodes are on a single physical server : 32 cpu/512GB ram/NVME storage.
go-carbon version :
ghcr.io/go-graphite/go-carbon:0.17.3
After checking existing issues, I tried both trie and/or trigram for indexes with no effect. I enabled pprof, the output is above.
may be related to #579
The text was updated successfully, but these errors were encountered: