Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoding error #8

Open
pgte opened this issue May 31, 2022 · 0 comments
Open

Decoding error #8

pgte opened this issue May 31, 2022 · 0 comments

Comments

@pgte
Copy link

pgte commented May 31, 2022

Hi, I'm getting an error when running an indexing job:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xb6 in position 8: ordinal not in range(128)

Do you have any hints?

Here is the full log:

(...)
[36](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:37)
Successfully installed certifi-2022.5.18.1 distlib-0.3.4 filelock-3.4.1 importlib-metadata-4.8.3 importlib-resources-5.4.0 pipenv-2022.4.8 platformdirs-2.4.0 six-1.16.0 typing-extensions-4.1.1 virtualenv-20.14.1 virtualenv-clone-0.5.7 zipp-3.6.0
[37](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:38)
WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available.
[38](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:39)
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
[39](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:40)
Installing dependencies from Pipfile.lock (aabb41)...
[40](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:41)
2022-05-31 16:57:57 [scrapy.core.scraper] ERROR: Spider error processing <GET https://dev.decipad.com/docs/language/numbers/> (referer: https://dev.decipad.com/docs/sitemap.xml)
[41](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:42)
Traceback (most recent call last):
[42](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:43)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/strategies/abstract_strategy.py", line 40, in get_dom
[43](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:44)
    body = response.body.decode(response.encoding)
[44](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:45)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x9c in position 4: ordinal not in range(128)
[45](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:46)

[46](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:47)
During handling of the above exception, another exception occurred:
[47](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:48)

[48](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:49)
Traceback (most recent call last):
[49](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:50)
  File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 662, in _runCallbacks
[50](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:51)
    current.result = callback(current.result, *args, **kw)
[51](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:52)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/documentation_spider.py", line 169, in parse_from_sitemap
[52](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:53)
    self.add_records(response, from_sitemap=True)
[53](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:54)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/documentation_spider.py", line 148, in add_records
[54](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:55)
    records = self.strategy.get_records_from_response(response)
[55](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:56)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/strategies/default_strategy.py", line 39, in get_records_from_response
[56](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:57)
    self.dom = self.get_dom(response)
[57](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:58)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/strategies/abstract_strategy.py", line 43, in get_dom
[58](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:59)
    result = lxml.html.fromstring(response.body)
[59](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:60)
  File "/usr/local/lib/python3.6/site-packages/lxml/html/__init__.py", line 875, in fromstring
[60](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:61)
    doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
[61](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:62)
  File "/usr/local/lib/python3.6/site-packages/lxml/html/__init__.py", line 764, in document_fromstring
[62](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:63)
    "Document is empty")
[63](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:64)
lxml.etree.ParserError: Document is empty
[64](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:65)
2022-05-31 16:57:57 [scrapy.core.scraper] ERROR: Spider error processing <GET https://dev.decipad.com/docs/language/> (referer: https://dev.decipad.com/docs/sitemap.xml)
[65](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:66)
Traceback (most recent call last):
[66](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:67)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/strategies/abstract_strategy.py", line 40, in get_dom
[67](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:68)
    body = response.body.decode(response.encoding)
[68](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:69)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb6 in position 8: ordinal not in range(128)
[69](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:70)

[70](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:71)
During handling of the above exception, another exception occurred:
[71](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:72)

[72](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:73)
Traceback (most recent call last):
[73](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:74)
  File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 662, in _runCallbacks
[74](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:75)
    current.result = callback(current.result, *args, **kw)
[75](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:76)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/documentation_spider.py", line 169, in parse_from_sitemap
[76](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:77)
    self.add_records(response, from_sitemap=True)
[77](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:78)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/documentation_spider.py", line 148, in add_records
[78](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:79)
    records = self.strategy.get_records_from_response(response)
[79](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:80)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/strategies/default_strategy.py", line 39, in get_records_from_response
[80](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:81)
    self.dom = self.get_dom(response)
[81](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:82)
  File "/github/workspace/docsearch-scraper/cli/../scraper/src/strategies/abstract_strategy.py", line 43, in get_dom
[82](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:83)
    result = lxml.html.fromstring(response.body)
[83](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:84)
  File "/usr/local/lib/python3.6/site-packages/lxml/html/__init__.py", line 875, in fromstring
[84](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:85)
    doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
[85](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:86)
  File "/usr/local/lib/python3.6/site-packages/lxml/html/__init__.py", line 764, in document_fromstring
[86](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:87)
    "Document is empty")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant