after_dispatch memory leak (sort of) #2064
-
Hi guys! If I try to download a large file via my mojolicious API, it will fill up RAM and the process gets shot by the oom-killer. I have traced the problem to this code:
What I'm trying to do in this hook is to centrally log what I'm returning as error messages from various methods. Obviously there would be other ways to handle it (suggestions welcome!) but can anyone see where the memory usage comes from? I must be doing something stupid here but I don't see what... |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 2 replies
-
Nothing looks out of sorts to me, and I don't think using the ad hook is a
poor approach to handling your logging.
Can you put that hook into a minimal lite app (or even a Test::Mojo) to
demonstrate the problem? Is the problem with a single large file or many
concurrent?
Hmm... Now that I ask that question, I'm wondering if this particular hook
is the wrong time to do that log? Have you tried a different hook?
FWIW, large file downloads are handled gracefully and automatically switch
from memory to disk after a certain -- configurable -- size, like 250K or
something like that.
…On Thu, May 4, 2023, 8:24 AM rasta ***@***.***> wrote:
Hi guys!
If I try to download a large file via my mojolicious API, it will fill up
RAM and the process gets shot by the oom-killer. I have traced the problem
to this code:
$self->hook(
'after_dispatch' => sub {
my $self = shift;
my $json = $self->res->json;
if ($json) {
if (ref $json eq ref {}) { # only if json is really a hash
if ($json->{alerts}) {
if (scalar(@{$json->{alerts}}) > 0) {
$self->app->log->debug("Alerts:\n" . $self->dumper($json->{alerts}));
}
}
}
}
);
What I'm trying to do in this hook is to centrally log what I'm returning
as error messages from various methods.
Obviously there would be other ways to handle it (suggestions welcome!)
but can anyone see where the memory usage comes from? I must be doing
something stupid here but I don't see what...
—
Reply to this email directly, view it on GitHub
<#2064>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAD6K6XDAFO3Q2E2HMM6Q23XEOU2HANCNFSM6AAAAAAXVZLQN4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
You're only accessing the JSON object, so you might access it directly as stated in the docs:
E.g:
The docs: (dynamic nature -> will be called in reverse order !)
The best method for this kind of debugging is to set the env var MOJO_CLIENT_DEBUG and pipe STDERR to a file. Docs:
|
Beta Was this translation helpful? Give feedback.
-
Accessing the response content via |
Beta Was this translation helpful? Give feedback.
-
Thank you all for the answers! When I provide a 10GB file, 4GB of memory gets littered within like 5 seconds. It makes no sense to try to parse some chunk of data as JSON of course, so the error is on my side in principle, I should check first what kind of request this was (or avoid handling error logging in this way, as suggested). On the other hand, I know too little about how the transmission is implemented, and perhaps the memory is being filled because the parser does not cleanup after itself in the subsequent requests or so? So here's the minimal app if anyone wants to investigate:
|
Beta Was this translation helpful? Give feedback.
-
Built a brand new Digital Ocean Droplet, 4vcpu and 8GB RAM
Note: I'm generating a random huge binary file so it's neither text nor
json.
# apt install cpanminus
# cpanm Mojolicious
# dd if=/dev/urandom of=bigfile bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (*1.0 GB*, 977 MiB) copied, 6.32744 s, 162 MB/s
# perl bigfile.pl get / > /dev/null
[2023-05-05 13:14:17.24976] [3021] [trace] [og7BJQBgkgXn] GET "/"
[2023-05-05 13:14:17.25031] [3021] [trace] [og7BJQBgkgXn] Routing to a
callback
[2023-05-05 13:14:17.25085] [3021] [trace] [og7BJQBgkgXn] 200 OK
(0.001083s, 923.361/s)
*// seems good here...*
# dd if=/dev/urandom of=bigfile bs=1024 count=10000000
10000000+0 records in
10000000+0 records out
10240000000 bytes (*10 GB*, 9.5 GiB) copied, 55.5755 s, 184 MB/s
# perl bigfile.pl get / > /dev/null
[2023-05-05 13:16:10.69400] [3024] [trace] [s1LxWlEF57kH] GET "/"
[2023-05-05 13:16:10.69441] [3024] [trace] [s1LxWlEF57kH] Routing to a
callback
[2023-05-05 13:16:10.69484] [3024] [trace] [s1LxWlEF57kH] 200 OK (0.00083s,
1204.819/s)
*Killed*
*// Woops, no good! So comment out my $json = $c->res->json*
# perl bigfile.pl get / > /dev/null
[2023-05-05 13:16:54.47274] [3045] [trace] [NWupPR4G7t6n] GET "/"
[2023-05-05 13:16:54.47317] [3045] [trace] [NWupPR4G7t6n] Routing to a
callback
[2023-05-05 13:16:54.47359] [3045] [trace] [NWupPR4G7t6n] 200 OK
(0.000855s, 1169.591/s)
*Maximum message size exceeded at
/usr/local/share/perl/5.34.0/Mojolicious/Command/get.pm <http://get.pm>
line 77.*
*// Woops, too big for Mojo to handle, so increase the max response size to
12GB*
# perl bigfile.pl get *-S 12147483648* / > /dev/null
[2023-05-05 13:18:36.76690] [3061] [trace] [YPap3XHM5C1x] GET "/"
[2023-05-05 13:18:36.76728] [3061] [trace] [YPap3XHM5C1x] Routing to a
callback
[2023-05-05 13:18:36.76772] [3061] [trace] [YPap3XHM5C1x] 200 OK
(0.000822s, 1216.545/s)
*// seems good here... So accessing the 10GB file via the
Mojo::Message::json method seems to be what kills it.*
*// Switch the $c->res->json call to $c->res->body*
# perl bigfile.pl get -S 12147483648 / > /dev/null
[2023-05-05 13:28:50.71380] [3177] [trace] [gdIX0ixp5jae] GET "/"
[2023-05-05 13:28:50.71431] [3177] [trace] [gdIX0ixp5jae] Routing to a
callback
[2023-05-05 13:28:50.71501] [3177] [trace] [gdIX0ixp5jae] 200 OK
(0.000892s, 1121.076/s)
*Killed*
*// well, it's not Mojo::Message::json doing it then!*
*// removed the hook and put $c->res->body and $c->res->json (separately)
in the '/' route, and neither killed*
*// so is the hook the issue?*
…On Fri, May 5, 2023 at 5:19 AM rasta ***@***.***> wrote:
Thank you all for the answers!
When I provide a 10GB file, 4GB of memory gets littered within like 5
seconds. It makes no sense to try to parse some chunk of data as JSON of
course, so the error is on my side in principle, I should check first what
kind of request this was (or avoid handling error logging in this way, as
suggested).
On the other hand, I know too little about how the transmission is
implemented, and perhaps the memory is being filled because the parser does
not cleanup after itself in the subsequent requests or so?
So here's the minimal app if anyone wants to investigate:
#!/usr/bin/env perl
use Mojolicious::Lite -signatures;
get '/' => sub ($c) {
my $asset = Mojo::Asset::File->new(path => "/path/to/huge/file");
$c->reply->asset($asset);
};
app->hook(
'after_dispatch' => sub {
my $c = shift;
my $json = $c->res->json;
}
);
app->start;
—
Reply to this email directly, view it on GitHub
<#2064 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAD6K6WCZA4PFHKXMDHK64TXETH2ZANCNFSM6AAAAAAXVZLQN4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Ah good catch, makes perfect sense!
…On Fri, May 5, 2023, 12:00 PM Joel Berger ***@***.***> wrote:
I could be wrong, but wouldn't it be expected to OOM trying to parse a
10GB json file with 4GB of RAM? Surely you'd have to hold the file in
memory to parse it (it isn't a stream-parse format) and then the parsed
representation would probably take at least that amount of memory in object
representation in the program, so that's like 20GB of needed memory just to
start.
—
Reply to this email directly, view it on GitHub
<#2064 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAD6K6VW3NNZD37E7YPG3T3XEUW4LANCNFSM6AAAAAAXVZLQN4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
Accessing the response content via
$c->res->json
results in an expensive read+JSON parsing operation, and stores the resulting data structure in$c->res
. That's not a real leak, but could be perceived as one if the data structures are large enough.