-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[proposal]Koordlet frequent restarts when CPI PSI monitoring is enabled due to high memory usage #1046
Comments
/area koordlet |
Enable CPI PSI, and test in one node cluster
The container used about 260MB memory, but When set Does the modification of |
Thank you for your reply. For now, the memory usage statistics of Golang after enabling the feature are not large, and the oom issue may be related to the memory usage of cgo's SQLite3. Although there are few oom situations in other components of Koordlet, replacing the database with TSDB is indeed under consideration, and with the use of TSDB in the future, the memory issue in this issue could be resolved.The issue link for the previous metric_cache refactoring plan is here:#586 |
@maaoBit Thanks for your contribution for testing possible memory leak in CPI and PSI collectors! The opnion about sqlite3 is quite useful and we will pay more attention on it such as replace it with TSDB. However, when this problem first occurred, I was wondering if it is an accessible and reasonable memory usage lift which simply caused by new collector. So I did some observations to see if set a higher memory limit can solve the OOM problem. The fact is that over a long period of time(3-4 days), the memory usage remains stable for a few hours, then starts to increase, and so on. This really shocks me. Could you give it a longer observe to see if the same phenomenon happens? I hope the way I did my evaluation is wrong and there is no other problems in the source code that leads to memory leak. |
OK, I'll keep watching for a few days. |
This issue has been automatically marked as stale because it has not had recent activity.
|
This issue has been automatically closed because it has not had recent activity.
|
What is your proposal:
When the psi cpi function is enabled, the koordlet container uses memory to maintain around 240mb, which is prone to trigger the 256mb limit and cause oom and pod restart problems, as shown in the figure below.
Why is this needed:
To improve the stability of the koodlet and reduce the overhead of metric collection
Is there a suggested solution, if so, please add it:
Maybe can use Golang's memory profiling tools and increase the frequency of garbage collection.
The text was updated successfully, but these errors were encountered: