You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm getting an error when running an indexing job:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb6 in position 8: ordinal not in range(128)
Do you have any hints?
Here is the full log:
(...)
[36](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:37)
Successfully installed certifi-2022.5.18.1 distlib-0.3.4 filelock-3.4.1 importlib-metadata-4.8.3 importlib-resources-5.4.0 pipenv-2022.4.8 platformdirs-2.4.0 six-1.16.0 typing-extensions-4.1.1 virtualenv-20.14.1 virtualenv-clone-0.5.7 zipp-3.6.0
[37](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:38)
WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available.
[38](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:39)
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
[39](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:40)
Installing dependencies from Pipfile.lock (aabb41)...
[40](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:41)
2022-05-31 16:57:57 [scrapy.core.scraper] ERROR: Spider error processing <GET https://dev.decipad.com/docs/language/numbers/> (referer: https://dev.decipad.com/docs/sitemap.xml)
[41](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:42)
Traceback (most recent call last):
[42](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:43)
File "/github/workspace/docsearch-scraper/cli/../scraper/src/strategies/abstract_strategy.py", line 40, in get_dom
[43](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:44)
body = response.body.decode(response.encoding)
[44](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:45)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x9c in position 4: ordinal not in range(128)
[45](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:46)
[46](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:47)
During handling of the above exception, another exception occurred:
[47](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:48)
[48](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:49)
Traceback (most recent call last):
[49](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:50)
File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 662, in _runCallbacks
[50](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:51)
current.result = callback(current.result, *args, **kw)
[51](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:52)
File "/github/workspace/docsearch-scraper/cli/../scraper/src/documentation_spider.py", line 169, in parse_from_sitemap
[52](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:53)
self.add_records(response, from_sitemap=True)
[53](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:54)
File "/github/workspace/docsearch-scraper/cli/../scraper/src/documentation_spider.py", line 148, in add_records
[54](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:55)
records = self.strategy.get_records_from_response(response)
[55](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:56)
File "/github/workspace/docsearch-scraper/cli/../scraper/src/strategies/default_strategy.py", line 39, in get_records_from_response
[56](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:57)
self.dom = self.get_dom(response)
[57](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:58)
File "/github/workspace/docsearch-scraper/cli/../scraper/src/strategies/abstract_strategy.py", line 43, in get_dom
[58](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:59)
result = lxml.html.fromstring(response.body)
[59](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:60)
File "/usr/local/lib/python3.6/site-packages/lxml/html/__init__.py", line 875, in fromstring
[60](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:61)
doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
[61](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:62)
File "/usr/local/lib/python3.6/site-packages/lxml/html/__init__.py", line 764, in document_fromstring
[62](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:63)
"Document is empty")
[63](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:64)
lxml.etree.ParserError: Document is empty
[64](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:65)
2022-05-31 16:57:57 [scrapy.core.scraper] ERROR: Spider error processing <GET https://dev.decipad.com/docs/language/> (referer: https://dev.decipad.com/docs/sitemap.xml)
[65](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:66)
Traceback (most recent call last):
[66](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:67)
File "/github/workspace/docsearch-scraper/cli/../scraper/src/strategies/abstract_strategy.py", line 40, in get_dom
[67](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:68)
body = response.body.decode(response.encoding)
[68](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:69)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb6 in position 8: ordinal not in range(128)
[69](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:70)
[70](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:71)
During handling of the above exception, another exception occurred:
[71](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:72)
[72](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:73)
Traceback (most recent call last):
[73](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:74)
File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 662, in _runCallbacks
[74](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:75)
current.result = callback(current.result, *args, **kw)
[75](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:76)
File "/github/workspace/docsearch-scraper/cli/../scraper/src/documentation_spider.py", line 169, in parse_from_sitemap
[76](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:77)
self.add_records(response, from_sitemap=True)
[77](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:78)
File "/github/workspace/docsearch-scraper/cli/../scraper/src/documentation_spider.py", line 148, in add_records
[78](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:79)
records = self.strategy.get_records_from_response(response)
[79](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:80)
File "/github/workspace/docsearch-scraper/cli/../scraper/src/strategies/default_strategy.py", line 39, in get_records_from_response
[80](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:81)
self.dom = self.get_dom(response)
[81](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:82)
File "/github/workspace/docsearch-scraper/cli/../scraper/src/strategies/abstract_strategy.py", line 43, in get_dom
[82](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:83)
result = lxml.html.fromstring(response.body)
[83](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:84)
File "/usr/local/lib/python3.6/site-packages/lxml/html/__init__.py", line 875, in fromstring
[84](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:85)
doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
[85](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:86)
File "/usr/local/lib/python3.6/site-packages/lxml/html/__init__.py", line 764, in document_fromstring
[86](https://github.com/decipad/decipad/runs/6675169856?check_suite_focus=true#step:4:87)
"Document is empty")
The text was updated successfully, but these errors were encountered:
Hi, I'm getting an error when running an indexing job:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb6 in position 8: ordinal not in range(128)
Do you have any hints?
Here is the full log:
The text was updated successfully, but these errors were encountered: