-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize function caching code #998
Conversation
Introduce a generalized version of query_graph_cache() that can be used for any data-collecting function.
pinc/misc.inc
Outdated
function memoize_function($function, $args = [], $expire_from_now = 3600, $key_salt = "") | ||
{ | ||
$memcache = new Memcached(); | ||
$memcache->addServer('localhost', 11211); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we unconditionally add? Per https://www.php.net/manual/en/memcached.addserver.php it seems not advised to add the same server multiple times to the same server pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function creates the memcached object at every call so we need to add it for every call. The object itself doesn't persist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yes, my bad. I guess the next question is should we not destroy it every call and instead keep the pool of servers alive? Does it have a strong impact to performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connection pooling is difficult with PHP as each page load is a distinct "execution" in most cases (this is not entirely accurate depending on how PHP is wired into the web server (eg Apache) and what multi-processing model is used (ie MPM)). It's this complexity that has us not pooling or using persistent DB connections either.
The per-instances loopback connections (for both memcache and the database) should be pretty lighweight and at least in this case we're just trading the MySQL connection + query complexity for a memcached connection + constant-time hash lookup. I will see if we can cache the connection within a page load via a static variable though, for the case where we have multiple calls to this function per page load -- let me test that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've cached the connection in a static function variable so we will at least re-use it in a single page load if there are multiple calls to the function.
$memcache = new Memcached(); | ||
$memcache->addServer('localhost', 11211); | ||
|
||
$key = hash("sha256", $function . serialize($args) . $key_salt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sha256 is resistant to collision attacks, this is a property we probably don't need for caching. Should we instead use something faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it from md5 to sha256 to reduce the collision space since this has no control over the function name, args, or salt. I don't know what the odds are of a collision with md5 though, so it might not matter and md5 might be good enough.
pinc/misc.inc
Outdated
function memoize_function($function, $args = [], $expire_from_now = 3600, $key_salt = "") | ||
{ | ||
$memcache = new Memcached(); | ||
$memcache->addServer('localhost', 11211); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if there is no memcache server? Is this setup in configuration somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no local memcache server running, or it's not connected, all of the memcache calls return false
and the code falls through to just calling the function, I tried to convey this in the function header with:
If memcache is not running, this will call the requested
function and return its data without generating a notice/warning/error.
So having memcache is effectively optional and if it isn't there or isn't working there won't be any caching.
The API uses memcache for rate limiting and this is documented in the API.md
. I should update INSTALL.md
and mention the use of memcache for other caching too, I'll do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated both API.md
and INSTALL.md
to include much more details about memcached usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thank you!
static $memcache = null; | ||
if (!$memcache) { | ||
$memcache = new Memcached(); | ||
$memcache->addServer('localhost', 11211); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor point, but should we make the port and server configurable similar to other dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should but I'm going to punt on that right now. The install docs talk about this being accessible on localhost so they at least reflect reality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on TEST, that when round_backlog_days.php
is refreshed in the sandbox, the cache hit number increases, but the deployed version of that file doesn't affect the cache.
Code looks good too.
Introduce a generalized version of
query_graph_cache()
that can be used for any data-collecting function. As for the name, see the Memoization wikipedia article if you're not familiar with it. The updated graph pages are best viewed ignoring whitespace as the code was just indented when added to a function.Testable in https://www.pgdp.org/~cpeel/c.branch/memoize-function/
I've added a new
memcache-stats
feature to the Squirrel Workbench on TEST that shows the memcache statistics. This can be used to confirm the new code is hitting memcache to get the cached data.