Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let beautifulsoup guess the input codec #8

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

JeffCarpenter
Copy link

@JeffCarpenter JeffCarpenter commented Nov 24, 2021

Unlike pathlib, BeautifulSoup can guess and handle several text codecs so we let it work its magic

Addresses issue #5

@fernandomora
Copy link

fernandomora commented Apr 21, 2023

Any change to get this merged?
This PR solved my problem reading a non utf-8 input

Traceback (most recent call last):
  File "/opt/homebrew/bin/html2csv", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/html2csv/__main__.py", line 41, in main
    html_doc = path.read_text()
               ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.3/Frameworks/Python.framework/Versions/3.11/lib/python3.11/pathlib.py", line 1059, in read_text
    return f.read()
           ^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 532: invalid continuation byte

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants