after_dispatch memory leak (sort of) #2064

RastislavHudak · 2023-05-04T13:24:37Z

RastislavHudak
May 4, 2023

Hi guys!

If I try to download a large file via my mojolicious API, it will fill up RAM and the process gets shot by the oom-killer. I have traced the problem to this code:

$self->hook(
    'after_dispatch' => sub {
      my $self = shift;

      my $json = $self->res->json;
      if ($json) {
        if (ref $json eq ref {}) {    # only if json is really a hash
          if ($json->{alerts}) {
            if (scalar(@{$json->{alerts}}) > 0) {
              $self->app->log->debug("Alerts:\n" . $self->dumper($json->{alerts}));
            }
          }
        }
      }
  );

What I'm trying to do in this hook is to centrally log what I'm returning as error messages from various methods.

Obviously there would be other ways to handle it (suggestions welcome!) but can anyone see where the memory usage comes from? I must be doing something stupid here but I don't see what...

Answered by kraih

May 5, 2023

Accessing the response content via $c->res->json results in an expensive read+JSON parsing operation, and stores the resulting data structure in $c->res. That's not a real leak, but could be perceived as one if the data structures are large enough.

View full answer

s1037989 · 2023-05-04T20:28:15Z

s1037989
May 4, 2023

Nothing looks out of sorts to me, and I don't think using the ad hook is a poor approach to handling your logging. Can you put that hook into a minimal lite app (or even a Test::Mojo) to demonstrate the problem? Is the problem with a single large file or many concurrent? Hmm... Now that I ask that question, I'm wondering if this particular hook is the wrong time to do that log? Have you tried a different hook? FWIW, large file downloads are handled gracefully and automatically switch from memory to disk after a certain -- configurable -- size, like 250K or something like that.

…

On Thu, May 4, 2023, 8:24 AM rasta ***@***.***> wrote: Hi guys! If I try to download a large file via my mojolicious API, it will fill up RAM and the process gets shot by the oom-killer. I have traced the problem to this code: $self->hook( 'after_dispatch' => sub { my $self = shift; my $json = $self->res->json; if ($json) { if (ref $json eq ref {}) { # only if json is really a hash if ($json->{alerts}) { if (scalar(@{$json->{alerts}}) > 0) { $self->app->log->debug("Alerts:\n" . $self->dumper($json->{alerts})); } } } } ); What I'm trying to do in this hook is to centrally log what I'm returning as error messages from various methods. Obviously there would be other ways to handle it (suggestions welcome!) but can anyone see where the memory usage comes from? I must be doing something stupid here but I don't see what... — Reply to this email directly, view it on GitHub <#2064>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAD6K6XDAFO3Q2E2HMM6Q23XEOU2HANCNFSM6AAAAAAXVZLQN4> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

0 replies

fskale · 2023-05-05T06:25:49Z

fskale
May 5, 2023

You're only accessing the JSON object, so you might access it directly as stated in the docs:

# Extract JSON values
      say $msg->json->{foo}{bar}[23];
      say $msg->json('/foo/bar/23');

E.g:

if ( ref $self->res->json->{alerts} eq ARRAY and @{$self->res->json->{alerts}})
 { $self->dumper($self->res->json->{alerts})) };

The docs: (dynamic nature -> will be called in reverse order !)

after_dispatch
    Emitted in reverse order after a response has been generated. Note that
    this hook can trigger out of order due to its dynamic nature, and with
    embedded applications will only work for the application that is
    generating the response.

The best method for this kind of debugging is to set the env var MOJO_CLIENT_DEBUG and pipe STDERR to a file.

Docs:

You can set the MOJO_CLIENT_DEBUG environment variable to get some advanced diagnostics information printed to STDERR.

MOJO_CLIENT_DEBUG=1

0 replies

kraih · 2023-05-05T09:37:43Z

kraih
May 5, 2023
Maintainer

Accessing the response content via $c->res->json results in an expensive read+JSON parsing operation, and stores the resulting data structure in $c->res. That's not a real leak, but could be perceived as one if the data structures are large enough.

0 replies

RastislavHudak · 2023-05-05T10:19:12Z

RastislavHudak
May 5, 2023
Author

Thank you all for the answers!

When I provide a 10GB file, 4GB of memory gets littered within like 5 seconds. It makes no sense to try to parse some chunk of data as JSON of course, so the error is on my side in principle, I should check first what kind of request this was (or avoid handling error logging in this way, as suggested).

On the other hand, I know too little about how the transmission is implemented, and perhaps the memory is being filled because the parser does not cleanup after itself in the subsequent requests or so?

So here's the minimal app if anyone wants to investigate:

#!/usr/bin/env perl
use Mojolicious::Lite -signatures;

get '/' => sub ($c) {
	my $asset = Mojo::Asset::File->new(path => "/path/to/huge/file");
	$c->reply->asset($asset);
};

app->hook(
    'after_dispatch' => sub {
      my $c = shift;
      my $json = $c->res->json;
   }
);

app->start;

2 replies

kraih May 5, 2023
Maintainer

If you want to only check JSON responses, the before_render hook might be best. There just check if a json key was set in $args.

jberger May 5, 2023
Maintainer

I could be wrong, but wouldn't it be expected to OOM trying to parse a 10GB json file with 4GB of RAM? Surely you'd have to hold the file in memory to parse it (it isn't a stream-parse format) and then the parsed representation would probably take at least that amount of memory in object representation in the program, so that's like 20GB of needed memory just to start.

s1037989 · 2023-05-05T14:42:16Z

s1037989
May 5, 2023

Built a brand new Digital Ocean Droplet, 4vcpu and 8GB RAM Note: I'm generating a random huge binary file so it's neither text nor json. # apt install cpanminus # cpanm Mojolicious # dd if=/dev/urandom of=bigfile bs=1024 count=1000000 1000000+0 records in 1000000+0 records out 1024000000 bytes (*1.0 GB*, 977 MiB) copied, 6.32744 s, 162 MB/s # perl bigfile.pl get / > /dev/null [2023-05-05 13:14:17.24976] [3021] [trace] [og7BJQBgkgXn] GET "/" [2023-05-05 13:14:17.25031] [3021] [trace] [og7BJQBgkgXn] Routing to a callback [2023-05-05 13:14:17.25085] [3021] [trace] [og7BJQBgkgXn] 200 OK (0.001083s, 923.361/s) *// seems good here...* # dd if=/dev/urandom of=bigfile bs=1024 count=10000000 10000000+0 records in 10000000+0 records out 10240000000 bytes (*10 GB*, 9.5 GiB) copied, 55.5755 s, 184 MB/s # perl bigfile.pl get / > /dev/null [2023-05-05 13:16:10.69400] [3024] [trace] [s1LxWlEF57kH] GET "/" [2023-05-05 13:16:10.69441] [3024] [trace] [s1LxWlEF57kH] Routing to a callback [2023-05-05 13:16:10.69484] [3024] [trace] [s1LxWlEF57kH] 200 OK (0.00083s, 1204.819/s) *Killed* *// Woops, no good! So comment out my $json = $c->res->json* # perl bigfile.pl get / > /dev/null [2023-05-05 13:16:54.47274] [3045] [trace] [NWupPR4G7t6n] GET "/" [2023-05-05 13:16:54.47317] [3045] [trace] [NWupPR4G7t6n] Routing to a callback [2023-05-05 13:16:54.47359] [3045] [trace] [NWupPR4G7t6n] 200 OK (0.000855s, 1169.591/s) *Maximum message size exceeded at /usr/local/share/perl/5.34.0/Mojolicious/Command/get.pm <http://get.pm> line 77.* *// Woops, too big for Mojo to handle, so increase the max response size to 12GB* # perl bigfile.pl get *-S 12147483648* / > /dev/null [2023-05-05 13:18:36.76690] [3061] [trace] [YPap3XHM5C1x] GET "/" [2023-05-05 13:18:36.76728] [3061] [trace] [YPap3XHM5C1x] Routing to a callback [2023-05-05 13:18:36.76772] [3061] [trace] [YPap3XHM5C1x] 200 OK (0.000822s, 1216.545/s) *// seems good here... So accessing the 10GB file via the Mojo::Message::json method seems to be what kills it.* *// Switch the $c->res->json call to $c->res->body* # perl bigfile.pl get -S 12147483648 / > /dev/null [2023-05-05 13:28:50.71380] [3177] [trace] [gdIX0ixp5jae] GET "/" [2023-05-05 13:28:50.71431] [3177] [trace] [gdIX0ixp5jae] Routing to a callback [2023-05-05 13:28:50.71501] [3177] [trace] [gdIX0ixp5jae] 200 OK (0.000892s, 1121.076/s) *Killed* *// well, it's not Mojo::Message::json doing it then!* *// removed the hook and put $c->res->body and $c->res->json (separately) in the '/' route, and neither killed* *// so is the hook the issue?*

…

On Fri, May 5, 2023 at 5:19 AM rasta ***@***.***> wrote: Thank you all for the answers! When I provide a 10GB file, 4GB of memory gets littered within like 5 seconds. It makes no sense to try to parse some chunk of data as JSON of course, so the error is on my side in principle, I should check first what kind of request this was (or avoid handling error logging in this way, as suggested). On the other hand, I know too little about how the transmission is implemented, and perhaps the memory is being filled because the parser does not cleanup after itself in the subsequent requests or so? So here's the minimal app if anyone wants to investigate: #!/usr/bin/env perl use Mojolicious::Lite -signatures; get '/' => sub ($c) { my $asset = Mojo::Asset::File->new(path => "/path/to/huge/file"); $c->reply->asset($asset); }; app->hook( 'after_dispatch' => sub { my $c = shift; my $json = $c->res->json; } ); app->start; — Reply to this email directly, view it on GitHub <#2064 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAD6K6WCZA4PFHKXMDHK64TXETH2ZANCNFSM6AAAAAAXVZLQN4> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

s1037989 · 2023-05-05T18:44:44Z

s1037989
May 5, 2023

Ah good catch, makes perfect sense!

…

On Fri, May 5, 2023, 12:00 PM Joel Berger ***@***.***> wrote: I could be wrong, but wouldn't it be expected to OOM trying to parse a 10GB json file with 4GB of RAM? Surely you'd have to hold the file in memory to parse it (it isn't a stream-parse format) and then the parsed representation would probably take at least that amount of memory in object representation in the program, so that's like 20GB of needed memory just to start. — Reply to this email directly, view it on GitHub <#2064 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAD6K6VW3NNZD37E7YPG3T3XEUW4LANCNFSM6AAAAAAXVZLQN4> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

after_dispatch memory leak (sort of) #2064

{{title}}

Replies: 6 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

after_dispatch memory leak (sort of) #2064

RastislavHudak May 4, 2023

Replies: 6 comments · 2 replies

s1037989 May 4, 2023

fskale May 5, 2023

kraih May 5, 2023 Maintainer

RastislavHudak May 5, 2023 Author

kraih May 5, 2023 Maintainer

jberger May 5, 2023 Maintainer

s1037989 May 5, 2023

s1037989 May 5, 2023

RastislavHudak
May 4, 2023

Replies: 6 comments 2 replies

s1037989
May 4, 2023

fskale
May 5, 2023

kraih
May 5, 2023
Maintainer

RastislavHudak
May 5, 2023
Author

kraih May 5, 2023
Maintainer

jberger May 5, 2023
Maintainer

s1037989
May 5, 2023

s1037989
May 5, 2023