Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a scraper check utility #124

Open
benoit74 opened this issue Feb 5, 2024 · 0 comments
Open

Add a scraper check utility #124

benoit74 opened this issue Feb 5, 2024 · 0 comments
Labels
enhancement New feature or request question Further information is requested
Milestone

Comments

@benoit74
Copy link
Collaborator

benoit74 commented Feb 5, 2024

Currently, we rely on various objects in scraperlib to:

  • create the ZIM
  • re-encode videos and images
  • cache these assets on the optimization cache

We might consider to have a mechanism to perform sanity checks on scraper behavior:

  • did we cached all re-encoded images / videos when a cache is present?
  • did we removed temporary files from the filesystem as they are added to the ZIM? (we know that while we prefer in-memory/streaming approaches, there are still many scrapers which are using the temporary file approach, and even some situation which have to rely on it)

What I do not yet know:

  • should we make the scraper fails if these checks fails?
  • is there any chance we automate these checks? (i.e. no need to modify the scrapers, or as little as possible - at least not make a call to "check_i_m_ok" mandatory, because the scraper developers might forget about it as well ; I doubt about this because there are many kind of situations)
  • can we do these checks early? (so that we fail the scraper asap instead of wasting time and resources)
@benoit74 benoit74 added enhancement New feature or request question Further information is requested labels Feb 5, 2024
@benoit74 benoit74 added this to the backlog milestone Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant