Memory spikes when (re)loading nginx configuration #6428
Replies: 8 comments 13 replies
-
@dbaumgarten could you please provide more information about the platform you use? |
Beta Was this translation helpful? Give feedback.
-
Have you seen any improvement in 3.7? There was a bug fixed in 3.7 where batch reloads weren't being turned off. This was a bug that would also only happen under high changes to trigger batch reloads. |
Beta Was this translation helpful? Give feedback.
-
While I was testing out v3.7 on our dev environment we had a major outage of our prod environmen. We have for now set limits and requests for the nginx-ingress pods to 25GB, which seems to be enough for them to come up and occasionally reload their config. We are now (a little) panically trying to figure out what is causing these spikes, as having them request 12x the usual amount of memory is not really a long-term solution. |
Beta Was this translation helpful? Give feedback.
-
@dbaumgarten Can you please give us a rough idea of total apps, VS/VSR or Ingress resources in your cluster and if there are any other resources not relevant to NIC deployment? |
Beta Was this translation helpful? Give feedback.
-
I have gathered a few screenshots from our monitoring to illustrate the situation that happend duing the outage on friday: |
Beta Was this translation helpful? Give feedback.
-
@dbaumgarten Can you please let us know if were able to run a dev upgrade to 3.7.0 and saw any difference? |
Beta Was this translation helpful? Give feedback.
-
Hi again, we have decided to simply update prod to v3.7 in the next possible change window and check if that changes anything. I will keep you updated on this. Meanwhile I continued investigating. I suspected that the long reload-times (measured by the metric nginx_ingress_controller_nginx_last_reload_milliseconds ) might be part of the problem (longer reloads -> more stuff can happen during a reload). Once I remove the ssl_crl directive from the server-snippets, reload-times drop from ~6 seconds to ~1 second. Any idea why the use of ssl_crl causes such a drastic increase in reload-times? |
Beta Was this translation helpful? Give feedback.
-
Hi, The update to v3.7.0 improved the situation and reduced the frequency and size of memory spikes during reload. Updating to v3.7.1 however was (for whatever reason) a complete gamechanger! I have no idea what exactly changed with v3.7.1 compared to v3.7.0, but it drastically improved the whole situation for us. https://nginx.org/en/CHANGES mentions for v1.27.2: "Feature: SSL certificates, secret keys, and CRLs are now cached on start or during reconfiguration." |
Beta Was this translation helpful? Give feedback.
-
Hi,
whenever nginx reload's it's configuration we see quite a large spike in memory-consumption for the pod.
The more reloads happen in a short timeframe the larger that spike is.
I understand that is because of the way such a reload works (new worker processes, draining of existing processes etc).
However I am a little surprised by the dimensions of that increase.
Here is a screenshot of the memory-consumption of nginx pods when they are beeing replaced by other pods via a rolling update.
(The behavior is very similar when just a config-reload is peformed)
As you can see the memory-usage of a pods spkies from <2GB to around ~10GB. A 5-fold increase.
Is such a large increase really normal? Is there something going wrong?
Given that reloads might happen when nginx is under high load (and therefore autoscaling increases the number of nginx and backend pods) that can become an issue.
High Load -> High Resource Usage -> Autoscaling triggers -> New Pods are created -> Config reload because of new pods -> Even higher Resource Usage because of the reloads.
Currently we have solved the issue by simply setting very high memory requests and limits (12GB).
But setting a 12 GB request for a pod that usually uses ~3GB of memory just feels wrong.
Beta Was this translation helpful? Give feedback.
All reactions