New command: logreport #531

jpmckinney · 2020-10-23T23:44:50Z

This command would implement the steps in the logs documentation, to report the most relevant lines from the log file. https://kingfisher-collect.readthedocs.io/en/latest/logs.html

We might want to make this a separate package, and extract the ScrapyLogFile class from Kingfisher Archive: https://github.com/open-contracting/kingfisher-archive/blob/master/ocdskingfisherarchive/scrapy_log_file.py

The text was updated successfully, but these errors were encountered:

jpmckinney · 2022-01-14T17:45:24Z

Started at https://github.com/open-contracting/scrapy-log-analyzer

jpmckinney · 2022-04-26T21:36:28Z

Note: scrapy-log-analyzer's logparser dependency is GPL. Make the command optional as documented at https://ocp-software-handbook.readthedocs.io/en/latest/python/preferences.html#license-compliance

jpmckinney · 2023-10-19T02:00:20Z

Here's the stub I had started (was in a git stash). It would mostly call the scrapy-log-analyzer package.

from scrapy.commands import ScrapyCommand
from scrapy.exceptions import UsageError
from scrapyloganalyzer import ScrapyLogFile


class LogReport(ScrapyCommand):
    def short_desc(self):
        return "Analyze a crawl's log file to assess the quality of the crawl"

    def syntax(self):
        return '[options] <logfile>'

    def run(self, args, opts):
        if len(args) < 1:
            raise UsageError()
        elif len(args) > 1:
            raise UsageError("Exactly one log file must be provided.")

jpmckinney · 2024-02-07T15:23:18Z

Another idea from #1048. The advantage is that it can interrupt a crawl, instead of waiting for it to end. Can maybe use the same approach as #1055

A more intensive option is to add a new feature, that checks the rate of 500 errors and cancels the crawl if too high. This should also send a new type of message to Kingfisher Process, to cancel processing.

jpmckinney added the framework Relating to other common functionality label Oct 23, 2020

jpmckinney added commands and removed framework Relating to other common functionality labels Feb 27, 2021

jpmckinney mentioned this issue May 12, 2021

Acceptance criteria - Kingfisher Collect open-contracting/data-registry#29

Open

This was referenced Aug 29, 2021

New Kingfisher Process integration #745

Merged

Use item_dropped signal to report duplicates to Kingfisher Process #619

Closed

Send crawl stats to Kingfisher Process #582

Closed

Send job id to Kingfisher Process #411

Closed

jpmckinney self-assigned this Sep 15, 2021

jpmckinney mentioned this issue Jan 29, 2024

uruguay_releases: Too many 500 errors #1048

Closed

This was referenced Apr 8, 2024

Filter out invalid and incomplete JSON #1058

Closed

Add duplicate-checking pipeline #1055

Closed

jpmckinney removed their assignment Apr 10, 2024

jpmckinney added this to the Priority milestone Apr 12, 2024

yolile mentioned this issue Apr 24, 2024

Add new features open-contracting/scrapy-log-analyzer#11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New command: logreport #531

New command: logreport #531

jpmckinney commented Oct 23, 2020 •

edited

Loading

jpmckinney commented Jan 14, 2022

jpmckinney commented Apr 26, 2022

jpmckinney commented Oct 19, 2023

jpmckinney commented Feb 7, 2024

New command: logreport #531

New command: logreport #531

Comments

jpmckinney commented Oct 23, 2020 • edited Loading

jpmckinney commented Jan 14, 2022

jpmckinney commented Apr 26, 2022

jpmckinney commented Oct 19, 2023

jpmckinney commented Feb 7, 2024

jpmckinney commented Oct 23, 2020 •

edited

Loading