Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize function caching code #998

Merged
merged 3 commits into from
Dec 28, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 3 additions & 19 deletions pinc/graph_data.inc
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
include_once($relPath.'dpsql.inc');
include_once($relPath.'project_states.inc');
include_once($relPath.'page_tally.inc');
include_once($relPath.'misc.inc'); // memoize_function()

function get_graph_js_files()
{
Expand Down Expand Up @@ -871,29 +872,12 @@ function get_round_backlog_stats($interested_phases)

/**
* Given a function and arguments that will generate graph data, attempt
* to load the data from memcached first and fall back to the function and
* cache the response.
* to cache the value.
*
* Note that the graph function might include _() so it's important that we
* include the user's language in the key.
*/
function query_graph_cache($function, $args = [], $expire_from_now = 3600)
{
$memcache = new Memcached();
$memcache->addServer('localhost', 11211);

$key = md5($function . join($args) . get_desired_language());

// if the key exists, just return the data uncompressed
$data = $memcache->get($key);
if ($data !== false) {
return json_decode(gzuncompress($data));
}

// if not, call the function to return the data
$data = call_user_func_array($function, $args);

$memcache->set($key, gzcompress(json_encode($data)), time() + $expire_from_now);

return $data;
return memoize_function($function, $args, $expire_from_now, get_desired_language());
}
32 changes: 32 additions & 0 deletions pinc/misc.inc
Original file line number Diff line number Diff line change
Expand Up @@ -1272,6 +1272,38 @@ function factor_strings($strings)
return [$left_common, $middles, $right_common];
}

//--------------------------------------------------------------------------

/**
* Given a function and arguments that will generate data, attempt
* to load the data from memcached first and fall back to the function and
* cache the response. If memcache is not running, this will call the requested
* function and return its data without generating a notice/warning/error.
*
* If the requested function returns data that has localized strings, it's
* important that $key_salt = get_desired_language();
*/
function memoize_function($function, $args = [], $expire_from_now = 3600, $key_salt = "")
{
$memcache = new Memcached();
$memcache->addServer('localhost', 11211);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we unconditionally add? Per https://www.php.net/manual/en/memcached.addserver.php it seems not advised to add the same server multiple times to the same server pool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function creates the memcached object at every call so we need to add it for every call. The object itself doesn't persist.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes, my bad. I guess the next question is should we not destroy it every call and instead keep the pool of servers alive? Does it have a strong impact to performance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connection pooling is difficult with PHP as each page load is a distinct "execution" in most cases (this is not entirely accurate depending on how PHP is wired into the web server (eg Apache) and what multi-processing model is used (ie MPM)). It's this complexity that has us not pooling or using persistent DB connections either.

The per-instances loopback connections (for both memcache and the database) should be pretty lighweight and at least in this case we're just trading the MySQL connection + query complexity for a memcached connection + constant-time hash lookup. I will see if we can cache the connection within a page load via a static variable though, for the case where we have multiple calls to this function per page load -- let me test that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've cached the connection in a static function variable so we will at least re-use it in a single page load if there are multiple calls to the function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if there is no memcache server? Is this setup in configuration somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no local memcache server running, or it's not connected, all of the memcache calls return false and the code falls through to just calling the function, I tried to convey this in the function header with:

If memcache is not running, this will call the requested
function and return its data without generating a notice/warning/error.

So having memcache is effectively optional and if it isn't there or isn't working there won't be any caching.

The API uses memcache for rate limiting and this is documented in the API.md. I should update INSTALL.md and mention the use of memcache for other caching too, I'll do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated both API.md and INSTALL.md to include much more details about memcached usage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thank you!


$key = hash("sha256", $function . serialize($args) . $key_salt);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sha256 is resistant to collision attacks, this is a property we probably don't need for caching. Should we instead use something faster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it from md5 to sha256 to reduce the collision space since this has no control over the function name, args, or salt. I don't know what the odds are of a collision with md5 though, so it might not matter and md5 might be good enough.


// if the key exists, just return the data uncompressed
$data = $memcache->get($key);
if ($data !== false) {
return unserialize(gzuncompress($data));
}

// if not, call the function to return the data
$data = call_user_func_array($function, $args);

$memcache->set($key, gzcompress(serialize($data)), time() + $expire_from_now);

return $data;
}

// -----------------------------------------------------------------------------

/**
Expand Down
18 changes: 13 additions & 5 deletions stats/round_backlog.php
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,20 @@
$width = 300;
$height = 200;

// Pull all interested phases, primarily all the rounds and PP
$interested_phases = array_keys($Round_for_round_id_);
$interested_phases[] = "PP";
function _get_round_backlog_data()
{
global $Round_for_round_id_;

// Pull the stats data out of the database
$stats = get_round_backlog_stats($interested_phases);
// Pull all interested phases, primarily all the rounds and PP
$interested_phases = array_keys($Round_for_round_id_);
$interested_phases[] = "PP";

// Pull the stats data out of the database
return get_round_backlog_stats($interested_phases);
}

// cache backlog data for 1 day
$stats = query_graph_cache("_get_round_backlog_data", [], 60 * 60 * 24);

// get the total of all phases
$stats_total = array_sum($stats);
Expand Down
62 changes: 36 additions & 26 deletions stats/round_backlog_days.php
Original file line number Diff line number Diff line change
Expand Up @@ -16,37 +16,47 @@
$width = 300;
$height = 200;

// Pull all interested phases, primarily all the rounds and PP
$interested_phases = array_keys($Round_for_round_id_);
function _get_round_backlog_days_data()
{
global $Round_for_round_id_;

// Pull all interested phases, primarily all the rounds and PP
$interested_phases = array_keys($Round_for_round_id_);

// Pull the stats data out of the database
$stats = get_round_backlog_stats($interested_phases);

// Get page saveAsDone trend information
$holder_id = 1;
$today = getdate();
foreach ($stats as $phase => $pages) {
$tallyboard = new TallyBoard($phase, 'S');

$pages_last_week = $tallyboard->get_delta_sum(
$holder_id,
mktime(0, 0, 0, $today['mon'], $today['mday'] - 7, $today['year']),
mktime(0, 0, 0, $today['mon'], $today['mday'], $today['year'])
);

$avg_pages_per_day[$phase] = $pages_last_week / 7;

// calculate the number of days to complete at the current rate
if ($avg_pages_per_day[$phase]) {
$stats[$phase] = $pages / $avg_pages_per_day[$phase];
} else {
$stats[$phase] = 0;
}
}

return $stats;
}

// Pull the stats data out of the database
$stats = get_round_backlog_stats($interested_phases);
// cache backlog data for 1 day
$stats = query_graph_cache("_get_round_backlog_days_data", [], 60 * 60 * 24);

// get the total of all phases
$stats_total = array_sum($stats);

// Get page saveAsDone trend information
$holder_id = 1;
$today = getdate();
foreach ($stats as $phase => $pages) {
$tallyboard = new TallyBoard($phase, 'S');

$pages_last_week = $tallyboard->get_delta_sum(
$holder_id,
mktime(0, 0, 0, $today['mon'], $today['mday'] - 7, $today['year']),
mktime(0, 0, 0, $today['mon'], $today['mday'], $today['year'])
);

$avg_pages_per_day[$phase] = $pages_last_week / 7;

// calculate the number of days to complete at the current rate
if ($avg_pages_per_day[$phase]) {
$stats[$phase] = $pages / $avg_pages_per_day[$phase];
} else {
$stats[$phase] = 0;
}
}

// calculate the goal percent as 100 / number_of_phases
$goal_percent = ceil(100 / count($stats));

Expand Down