Skip to content
This repository has been archived by the owner on Oct 28, 2022. It is now read-only.

Pause Heritrix jobs when validation queue is too large #84

Open
yuliya-ivaniukovich opened this issue Oct 25, 2017 · 0 comments
Open

Pause Heritrix jobs when validation queue is too large #84

yuliya-ivaniukovich opened this issue Oct 25, 2017 · 0 comments
Assignees
Labels
Milestone

Comments

@yuliya-ivaniukovich
Copy link
Contributor

For some domains we obtain a large amount of downloaded pdf files which are placed in validation queue. Maybe we have to make some heritrix pausing mechanism? If we have already more than X files for this domain in queue, then pause heritrix crawling for this domain and resume it after the amount of files in queue will become less than Y (this is server machine storage optimization)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants