RPC/Serialization Overhead/Delay #1850
Replies: 2 comments 8 replies
-
Hey @L3tum 👋 |
Beta Was this translation helpful? Give feedback.
-
Hey @rustatian ! Here's a small (2MB zipped) repro, I stripped it down from our existing project so the configuration is (mostly) identical. There also aren't any differences in versions employed so it's as 1:1 as I can give you. There's a Dockerfile with two targets included as well as a docker-compose.yaml which also starts webgrind. The whole thing should be runnable locally as well, though. If you run it either with the Dockerfile->Dev target or locally you'll need to install the composer dependencies manually. A simple I've played around a bit with non-blocking IO. It's a sore topic for PHP obviously and I didn't rip everything out and use a framework like AMPHP for it, but I did implement a I tested the ideal number of sockets each would create and noticed that 5-10 Sockets is apparently the sweet spot (for that test anyways). NiceMultiRPC pre-connects the sockets and can scale to more sockets than that (I usually used 50). I guess the socket reuse trumps over the delay of having to connect the socket after about 10 sockets. Anyways, with these two RPC implementations I've managed to cut the test time down from 1ms to 0.07ms :) FYI I've also added a |
Beta Was this translation helpful? Give feedback.
-
Heya, I'm mainly looking for other work to see if anybody actually measured this in a good capacity because I'm doubting my own numbers.
I've been on the hunt for some performance issues and noticed that our Symfony
kernel.terminate
EventListener takes ~2ms in prod (so with opcache, JIT, cache warmer and what not). However we only collect Metrics there and don't do anything else. I even checked and there aren't any other listeners or hidden things executed.Curious to see why it takes so long I thought I'd "profile" (I use that term very loosely here) the
Metrics
class, since it sends off some RPC calls and does some serialization.I've made a basic
MetricsProfiler
that I inject with aCompilerPass
. TheMetricsProfiler
is very simple just the following, for each method:The resulting log entries for a single request are here (click on this)
Obviously added together this is a bit more than 2ms, but it's also collected locally (on a frankly anemic laptop) without JIT (but with OPCache and
APP_ENV=prod
andpool.debug=false
and the works).Either way this is entirely too long IMO and why I think something must be wrong on my end. But it's also the only way I can explain our issues with the
kernel.terminate
Listener, because it does little else but this.I've also run
xdebug.mode=profile
through this and while I can't share the cachegrind file, here's the relevant screenshots from QCachegrind. If I understand its interface correctly, each "time unit" is 10ns here, so if the call took 44000 "units" it'd be around 440000ns or 440microseconds, or 0,4ms, which supports my measurement above.Click me!
RPC->call has these Callees
RPC->decodeResponse has these Callees
Stepping into the Protobuf callstack confirms its using the extension, no pure-php bullshit.
The worst seems to be the KV Cache though
I'm not sure why that one is so slow in particular.
I've tried to look through the code but haven't found anything obviously amiss. There's some protobuf stuff I don't really know, but I do have the protobuf extension installed and loaded
The gRPC extension is currently misbehaving so it isn't loaded, but I also haven't seen any reference to Roadrunner needing it. Sockets is installed as well though
I really want to use the Metrics plugin but this basically ruins our performance. One idea if the RPC overhead is the issue would be to batch-send the metrics, but I'm not sure how easy or quick that could be done. It could also be that prometheus-go is just particularly slow, but that would still make it a non-starter to use the plugin.
Beta Was this translation helpful? Give feedback.
All reactions