You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 28, 2022. It is now read-only.
For some domains we obtain a large amount of downloaded pdf files which are placed in validation queue. Maybe we have to make some heritrix pausing mechanism? If we have already more than X files for this domain in queue, then pause heritrix crawling for this domain and resume it after the amount of files in queue will become less than Y (this is server machine storage optimization)
The text was updated successfully, but these errors were encountered:
For some domains we obtain a large amount of downloaded pdf files which are placed in validation queue. Maybe we have to make some heritrix pausing mechanism? If we have already more than X files for this domain in queue, then pause heritrix crawling for this domain and resume it after the amount of files in queue will become less than Y (this is server machine storage optimization)
The text was updated successfully, but these errors were encountered: