We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi, I have a problem with encoding on a webpage encoded iso-8859-2.
If I go:
session = HTMLSession() page = session.get('https://bonito.pl/bestsellery')
page.html is without Polish letters. I tried to workaround it:
from urllib.request import urlopen textPage = urlopen("https://bonito.pl/bestsellery") textPage = textPage.read().decode( "ISO-8859-2") //Polish letters properly decoded page = HTML(html=textPage, default_encoding="ISO-8859-2") //page.html still without Polish letters
and also
textPage = textPage.read().decode( "ISO-8859-2").encode("UTF-8") page = HTML(html=textPage, default_encoding="UTF-8")
but without solving the problem.
If I use BeautifulSoup:
page = BeautifulSoup(textPage,"lxml",from_encoding="utf-8") page = HTML(html=page.html)
Polish letters are properly decoded and visible.
I definitely prefer request_html over bs4, so I would be very grateful for help. How can solve this issue? Thanks!
The text was updated successfully, but these errors were encountered:
Have you tried to change the page.html.encoding to utf-8?
Sorry, something went wrong.
No branches or pull requests
Hi, I have a problem with encoding on a webpage encoded iso-8859-2.
If I go:
page.html is without Polish letters.
I tried to workaround it:
and also
but without solving the problem.
If I use BeautifulSoup:
Polish letters are properly decoded and visible.
I definitely prefer request_html over bs4, so I would be very grateful for help. How can solve this issue?
Thanks!
The text was updated successfully, but these errors were encountered: