You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we rely on various objects in scraperlib to:
create the ZIM
re-encode videos and images
cache these assets on the optimization cache
We might consider to have a mechanism to perform sanity checks on scraper behavior:
did we cached all re-encoded images / videos when a cache is present?
did we removed temporary files from the filesystem as they are added to the ZIM? (we know that while we prefer in-memory/streaming approaches, there are still many scrapers which are using the temporary file approach, and even some situation which have to rely on it)
What I do not yet know:
should we make the scraper fails if these checks fails?
is there any chance we automate these checks? (i.e. no need to modify the scrapers, or as little as possible - at least not make a call to "check_i_m_ok" mandatory, because the scraper developers might forget about it as well ; I doubt about this because there are many kind of situations)
can we do these checks early? (so that we fail the scraper asap instead of wasting time and resources)
The text was updated successfully, but these errors were encountered:
Currently, we rely on various objects in scraperlib to:
We might consider to have a mechanism to perform sanity checks on scraper behavior:
What I do not yet know:
The text was updated successfully, but these errors were encountered: