Memory leak issues #79

jaymzh · 2023-07-12T21:13:36Z

The webserver crashes a few times a week due to running out of memory. @irabinovitch did some sleuthing and believes it's holding open old database connections and they are stacking up.

We should definitely try to sort this out as part of the 2024 launch.

DrupalPhil · 2023-07-13T21:41:12Z

@irabinovitch Can you share what you found?

irabinovitch · 2023-07-17T17:56:41Z

I haven't found anything or done any investigation yet.

irabinovitch · 2023-07-18T23:14:42Z

Looking briefly at monitoring data when this happens CPU and memory spike. Apache2 processes look to be where thats happening from Datadog process monitoring.

From a quick look at the configs it seems we're using mpm_prerfork here. One idea we could consider is to set MaxRequestsPerChild to something other than the default of 0. That would limit the number of requests before an individual Apache process is retired/restarted. I dont know that I'd call that a fix, but it would probably at least mitigate the run away memory usage.

jaymzh · 2023-07-18T23:49:39Z

I think you mean MaxConnectionsPerChild ?
Definitely agree on the changing that. Even if we could track down some specific bug, php can be pretty leaky in this way, and my understanding is best practices is to not have MaxConnectionsPerChild at 0. But... my experience in such areas is pretty outdated.

Oh, I see the old name was MaxRequestsPerChild. Samesame. I can whip up a PR for that.

irabinovitch · 2023-07-18T23:52:43Z

Sure, MaxRequestsPerChild is the new name for MaxConnectionsPerChild as of 2.3.9. I think they have the exact same effect, but yes we should use the new name. Definitely shouldn't be 0, thats just the default. Not sure what a reasonable number is, and yes I'd like to find whatever we're leaking or hanging on - but this seems like a reasonable defense if someone wants to try it.

jaymzh · 2023-07-18T23:58:53Z

socallinuxexpo/scale-chef#268

irabinovitch · 2023-08-05T06:20:45Z

This doesn't seem to have helped stability.

DrupalPhil · 2023-10-06T21:20:14Z

Can you provide access to datadog or whatever logging system you have?

irabinovitch · 2023-10-07T13:03:20Z

Sent an invite. I don't think we have php tracing or profiling enabled at the moment. Can do that if we'd like.

…

On Sat, Oct 7, 2023, 12:20 AM Phillip Smith ***@***.***> wrote: Can you provide access to datadog or whatever logging system you have? — Reply to this email directly, view it on GitHub <#79 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACGW5V2UJSW7T6BECDY7CDX6BYZRAVCNFSM6AAAAAA2IBFOWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJRGQYTKOBTGY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

DrupalPhil · 2023-10-08T16:09:14Z

It's not a glaring issue but I've optimized the query, anyway. While I seek other potential culprits, let's get this merged into prod.

* Log the state of apache processes when we restart so we can continue to understand the problem * Restart slightly less often * Update MaxConnectionsPerChild to 50, which is probably a bit more reasonable than 5 * Update server limit (see below) * Turn off KeepAlive (see below) From scale-infra, explanation on the last two above. ``` This is a confirmed bug in Apache with no configuration workaround: https://bz.apache.org/bugzilla/show_bug.cgi?id=53555 There's a pretty good description of the details, which match our behavior here: https://serverfault.com/questions/516373/what-is-the-meaning-of-ah00485-scoreboard-is-full-not-at-maxrequestworkers One thing that some people seem to have had success with is turning keepalive off. ServerLimit, for the event MPM (what we use), the recommendation is: With event, increase this directive if the process number defined by your MaxRequestWorkers and ThreadsPerChild settings, plus the number of gracefully shutting down processes, is more than 16 server processes (default). Our MaxRequestWorkers is 50. We don't define ThreadsPerChild which defaults to 25. So if I read that correctly (and I haven't tuned Apache for a living in a looonnnggg time), we want something like 50+25=75 plus some more for shutting down processes, so like... 80? I can prep a diff for that plus keepalive and see how that goes. ``` All continuing socallinuxexpo/scale-drupal#79 Signed-off-by: Phil Dibowitz <[email protected]>

jaymzh · 2023-10-12T02:30:39Z

OK I've done a few more things here:

atop is now deployed on all our servers, including the webserver, which should give us some better visibility
In dropped the restarts to every hour instead of every 30 minutes, and before we do so, we some data about the state of the Apache processes. If this shows the processes aren't in wait state, we can probably try 2 hours and so on.
Did a variety of other small tunings to apache that may or may not help. See Try some more tuning scale-chef#293 for details.

jaymzh added the 2024 site launch label Jul 12, 2023

jaymzh assigned DrupalPhil Jul 12, 2023

DrupalPhil added a commit that referenced this issue Oct 8, 2023

Optimizes scale_utility database query #79

10a98f3

jaymzh mentioned this issue Oct 8, 2023

Try some more tuning socallinuxexpo/scale-chef#293

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak issues #79

Memory leak issues #79

jaymzh commented Jul 12, 2023

DrupalPhil commented Jul 13, 2023

irabinovitch commented Jul 17, 2023

irabinovitch commented Jul 18, 2023

jaymzh commented Jul 18, 2023

irabinovitch commented Jul 18, 2023

jaymzh commented Jul 18, 2023

irabinovitch commented Aug 5, 2023

DrupalPhil commented Oct 6, 2023

irabinovitch commented Oct 7, 2023 via email

DrupalPhil commented Oct 8, 2023

jaymzh commented Oct 12, 2023

Memory leak issues #79

Memory leak issues #79

Comments

jaymzh commented Jul 12, 2023

DrupalPhil commented Jul 13, 2023

irabinovitch commented Jul 17, 2023

irabinovitch commented Jul 18, 2023

jaymzh commented Jul 18, 2023

irabinovitch commented Jul 18, 2023

jaymzh commented Jul 18, 2023

irabinovitch commented Aug 5, 2023

DrupalPhil commented Oct 6, 2023

irabinovitch commented Oct 7, 2023 via email

DrupalPhil commented Oct 8, 2023

jaymzh commented Oct 12, 2023