Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize function caching code #998

Merged
merged 3 commits into from
Dec 28, 2023

Conversation

cpeel
Copy link
Member

@cpeel cpeel commented Dec 27, 2023

Introduce a generalized version of query_graph_cache() that can be used for any data-collecting function. As for the name, see the Memoization wikipedia article if you're not familiar with it. The updated graph pages are best viewed ignoring whitespace as the code was just indented when added to a function.

Testable in https://www.pgdp.org/~cpeel/c.branch/memoize-function/

I've added a new memcache-stats feature to the Squirrel Workbench on TEST that shows the memcache statistics. This can be used to confirm the new code is hitting memcache to get the cached data.

Introduce a generalized version of query_graph_cache() that can be
used for any data-collecting function.
@cpeel cpeel self-assigned this Dec 27, 2023
pinc/misc.inc Outdated
function memoize_function($function, $args = [], $expire_from_now = 3600, $key_salt = "")
{
$memcache = new Memcached();
$memcache->addServer('localhost', 11211);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we unconditionally add? Per https://www.php.net/manual/en/memcached.addserver.php it seems not advised to add the same server multiple times to the same server pool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function creates the memcached object at every call so we need to add it for every call. The object itself doesn't persist.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes, my bad. I guess the next question is should we not destroy it every call and instead keep the pool of servers alive? Does it have a strong impact to performance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connection pooling is difficult with PHP as each page load is a distinct "execution" in most cases (this is not entirely accurate depending on how PHP is wired into the web server (eg Apache) and what multi-processing model is used (ie MPM)). It's this complexity that has us not pooling or using persistent DB connections either.

The per-instances loopback connections (for both memcache and the database) should be pretty lighweight and at least in this case we're just trading the MySQL connection + query complexity for a memcached connection + constant-time hash lookup. I will see if we can cache the connection within a page load via a static variable though, for the case where we have multiple calls to this function per page load -- let me test that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've cached the connection in a static function variable so we will at least re-use it in a single page load if there are multiple calls to the function.

$memcache = new Memcached();
$memcache->addServer('localhost', 11211);

$key = hash("sha256", $function . serialize($args) . $key_salt);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sha256 is resistant to collision attacks, this is a property we probably don't need for caching. Should we instead use something faster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it from md5 to sha256 to reduce the collision space since this has no control over the function name, args, or salt. I don't know what the odds are of a collision with md5 though, so it might not matter and md5 might be good enough.

pinc/misc.inc Outdated
function memoize_function($function, $args = [], $expire_from_now = 3600, $key_salt = "")
{
$memcache = new Memcached();
$memcache->addServer('localhost', 11211);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if there is no memcache server? Is this setup in configuration somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no local memcache server running, or it's not connected, all of the memcache calls return false and the code falls through to just calling the function, I tried to convey this in the function header with:

If memcache is not running, this will call the requested
function and return its data without generating a notice/warning/error.

So having memcache is effectively optional and if it isn't there or isn't working there won't be any caching.

The API uses memcache for rate limiting and this is documented in the API.md. I should update INSTALL.md and mention the use of memcache for other caching too, I'll do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated both API.md and INSTALL.md to include much more details about memcached usage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thank you!

static $memcache = null;
if (!$memcache) {
$memcache = new Memcached();
$memcache->addServer('localhost', 11211);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor point, but should we make the port and server configurable similar to other dependencies?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should but I'm going to punt on that right now. The install docs talk about this being accessible on localhost so they at least reflect reality.

Copy link
Collaborator

@windymilla windymilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on TEST, that when round_backlog_days.php is refreshed in the sandbox, the cache hit number increases, but the deployed version of that file doesn't affect the cache.
Code looks good too.

@cpeel cpeel merged commit 6948eb3 into DistributedProofreaders:master Dec 28, 2023
4 checks passed
@cpeel cpeel deleted the memoize-function branch December 28, 2023 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants