You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During OAI harvesting, resumptionTokens are created in the TokenRepository if the number of returned items exceeds the settings.limit (e.g 50 records). The main action of the OAI controller deletes expired tokens, that reach their lifetime determined by settings.expired (e.g. 1800 seconds), once a new main action is executed. We ran into situations, where the OAI interface "freezed" and returned a blank page without a warning.
A deeper look into the issue showed, that it had indirectly to do with the settings.expired and that we were only able to get our OAI running again, by deleting the tokens in tx_dlf_tokens and setting the expire time unreasonably low (e.g. 60 seconds). The reason is, that the deletion of expired tokens caused an internal server error (500), because the script reached the php memory_limit. And the reason for this is, that:
and depending on the amount of tokens, that are considered expired at the given moment, this can be (as in our case) with bad luck several hundreds of tokens or more
thus $tokensToBeRemoved may even reach sizes in the several gigabytes range (each token takes about 2-4 MB multplied by the number of expired tokens in that moment)
which ultimatly leads to the script reaching the php memory_limit
We get into a situation like this, for example if:
a client is successfully harvesting all of our records in quick intervals, so that only a handfull of tokens are expired at a given moment and will get deleted properly
but leaving hundreds of tokens still valid until they reach their actual expiration date
then maybe a few hours or days later another clients wants to harvest and suddenly in the main action hundreds of tokens are flagged as expired, which results in the said internal server error due to reaching the memory_limit
Reproduction
Steps to reproduce the behaviour:
set up a php memory_limit of e.g. 512MB or 1GB
Go to your OAI plugin and set a limit of e.g. 50 and a lifetime of e.g. 1800 seconds
navigate to your OAI controller and listRecords as mets, dc or epicur over all your records in a reasonable fast interval (1-5 seconds)
harvest multiple times if necessary, so that you amount for several hundred resumption tokens in the database
after being done harvesting wait for e.g. an hour (at least long enough, so that alle tokens are considered expired) and try to listRecords again
you should see a blank page
Expected Behavior
When deleting expired tokens, we should not put all expired tokens in a single big variable $tokensToBeRemoved and iterate over it because it could reach extraordinary sizes. Instead we should query and delete each expired token individualy.
I was thinking about something like this ... but maybe there are better ways or maybe typo3 offers a less memory expensive way to do the current way? Do we need to get the whole token with its options, that take up so much memory or can it be done just by uid?
Pseudo-code to delete expired tokens directly and individually (many consecutive queries):
public function deleteExpiredTokens($expireTime)
{
$query = $this->createQuery();
$constraints = [];
$constraints[] = $query->lessThan('tstamp', (int) (time() - $expireTime));
if (count($constraints)) {
foreach ($contraints as $constraint) {
$query->matching($constraint);
$tokenToBeRemoved = $query->execute();
$this->remove($tokenToBeRemoved);
}
}
}
Additional Context
We actually want to configure our OAI so it reflects the settings used by the DNB...namely 50 records per response and a lifetime of up to 30minutes (https://www.dnb.de/EN/Professionell/Metadatendienste/Datenbezug/OAI/_content/faqOai12_akk.html). While 50 records per response seems reasonable and uncritical, the lifetime of 1800 seconds may cause a huge block of expired tokens, that run into the described memory issue.
The text was updated successfully, but these errors were encountered:
I think the right approach here would be to not handle maintenance tasks like purging invalid resumption tokens in an action controller at all. Instead, this should be implemented as a task for the TYPO3 Scheduler. That way it wouldn't encumber the controller any more and it could be executed regularly without depending on someone executing the controller action...
Description
During OAI harvesting, resumptionTokens are created in the TokenRepository if the number of returned items exceeds the
settings.limit
(e.g 50 records). The main action of the OAI controller deletes expired tokens, that reach their lifetime determined bysettings.expired
(e.g. 1800 seconds), once a new main action is executed. We ran into situations, where the OAI interface "freezed" and returned a blank page without a warning.A deeper look into the issue showed, that it had indirectly to do with the
settings.expired
and that we were only able to get our OAI running again, by deleting the tokens intx_dlf_tokens
and setting the expire time unreasonably low (e.g. 60 seconds). The reason is, that the deletion of expired tokens caused an internal server error (500), because the script reached the phpmemory_limit
. And the reason for this is, that:$tokensToBeRemoved
(https://github.com/kitodo/kitodo-presentation/blob/2f30469c99ebb57a23e8c39d39dbadfffad8791a/Classes/Domain/Repository/TokenRepository.php#L48C1-L48C48), which we then loop over in order to delete said single tokens$tokensToBeRemoved
may even reach sizes in the several gigabytes range (each token takes about 2-4 MB multplied by the number of expired tokens in that moment)memory_limit
We get into a situation like this, for example if:
memory_limit
Reproduction
Steps to reproduce the behaviour:
set up a php memory_limit of e.g. 512MB or 1GB
Go to your OAI plugin and set a limit of e.g. 50 and a lifetime of e.g. 1800 seconds
navigate to your OAI controller and
listRecords
as mets, dc or epicur over all your records in a reasonable fast interval (1-5 seconds)harvest multiple times if necessary, so that you amount for several hundred resumption tokens in the database
after being done harvesting wait for e.g. an hour (at least long enough, so that alle tokens are considered expired) and try to listRecords again
you should see a blank page
Expected Behavior
When deleting expired tokens, we should not put all expired tokens in a single big variable
$tokensToBeRemoved
and iterate over it because it could reach extraordinary sizes. Instead we should query and delete each expired token individualy.I was thinking about something like this ... but maybe there are better ways or maybe typo3 offers a less memory expensive way to do the current way? Do we need to get the whole token with its
options
, that take up so much memory or can it be done just by uid?Pseudo-code to delete expired tokens directly and individually (many consecutive queries):
Additional Context
We actually want to configure our OAI so it reflects the settings used by the DNB...namely 50 records per response and a lifetime of up to 30minutes (https://www.dnb.de/EN/Professionell/Metadatendienste/Datenbezug/OAI/_content/faqOai12_akk.html). While 50 records per response seems reasonable and uncritical, the lifetime of 1800 seconds may cause a huge block of expired tokens, that run into the described memory issue.
The text was updated successfully, but these errors were encountered: