Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

corrupted json document for scraping #11

Open
pipermerriam opened this issue May 7, 2019 · 0 comments
Open

corrupted json document for scraping #11

pipermerriam opened this issue May 7, 2019 · 0 comments

Comments

@pipermerriam
Copy link
Member

What is wrong

While working on concurrency for scraping I hit a program crash. Now when I try to run the scraper this happens:

$ python __main__.py scrape
EthPM CLI v0.1.0a0

Traceback (most recent call last):
  File "__main__.py", line 126, in <module>
    main(parser, logger)
  File "__main__.py", line 115, in main
    scraper(args)
  File "__main__.py", line 43, in scraper
    last_scraped_block = trio.run(scrape, w3, ethpmcli_dir, start_block)
  File "/home/piper/python-environments/ethpm-cli/lib/python3.6/site-packages/trio/_core/_run.py", line 1444, in run
    raise runner.main_task_outcome.error
  File "/home/piper/projects/ethpm-cli/ethpm_cli/scraper.py", line 32, in scrape
    initialize_ethpm_dir(ethpm_dir, w3)
  File "/home/piper/projects/ethpm-cli/ethpm_cli/scraper.py", line 85, in initialize_ethpm_dir
    validate_chain_data_store(chain_data_path, w3)
  File "/home/piper/projects/ethpm-cli/ethpm_cli/validation.py", line 68, in validate_chain_data_store
    chain_data = json.loads(chain_data_path.read_text())
  File "/home/piper/.pyenv/versions/3.6.5/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/home/piper/.pyenv/versions/3.6.5/lib/python3.6/json/decoder.py", line 342, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 26 column 2 (char 415)

The underlying JSON document is:

{
    "chain_id": "0x1",
    "scraped_blocks": [
        {
            "min": "0",
            "max": "295"
        },
        {
            "min": "297",
            "max": "297"
        },
        {
            "min": "302",
            "max": "303"
        },
        {
            "min": "309",
            "max": "309"
        },
        {
            "min": "313",
            "max": "313"
        }
    ]
}
 {
            "min": "313",
            "max": "313"
        }
    ]
}

Now this almost definitely from the file being written to by multiple concurrent threads, however, the program should gracefully handle corrupt files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant