-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New command: logreport #531
Comments
Note: scrapy-log-analyzer's logparser dependency is GPL. Make the command optional as documented at https://ocp-software-handbook.readthedocs.io/en/latest/python/preferences.html#license-compliance |
Here's the stub I had started (was in a git stash). It would mostly call the scrapy-log-analyzer package. from scrapy.commands import ScrapyCommand
from scrapy.exceptions import UsageError
from scrapyloganalyzer import ScrapyLogFile
class LogReport(ScrapyCommand):
def short_desc(self):
return "Analyze a crawl's log file to assess the quality of the crawl"
def syntax(self):
return '[options] <logfile>'
def run(self, args, opts):
if len(args) < 1:
raise UsageError()
elif len(args) > 1:
raise UsageError("Exactly one log file must be provided.") |
Another idea from #1048. The advantage is that it can interrupt a crawl, instead of waiting for it to end. Can maybe use the same approach as #1055
|
This command would implement the steps in the logs documentation, to report the most relevant lines from the log file. https://kingfisher-collect.readthedocs.io/en/latest/logs.html
We might want to make this a separate package, and extract the
ScrapyLogFile
class from Kingfisher Archive: https://github.com/open-contracting/kingfisher-archive/blob/master/ocdskingfisherarchive/scrapy_log_file.pyThe text was updated successfully, but these errors were encountered: