Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

behave_html_formatter:HTMLFormatter: Unicode/UTF-8 issues #22

Open
y4rk4s opened this issue Jul 7, 2021 · 10 comments
Open

behave_html_formatter:HTMLFormatter: Unicode/UTF-8 issues #22

y4rk4s opened this issue Jul 7, 2021 · 10 comments

Comments

@y4rk4s
Copy link

y4rk4s commented Jul 7, 2021

Hi!

There are some issues with utf-8 character writing.
We are try to generated reports for Hungarian language, please check special characters bellow,

OS: Win10
Python: 3.8( / 3.6 )
Key: HU
Behave version: 0.98 ( with older version also reproducible )
variable is set on OS:
pythonioencoding=utf8

Command:
behave --tags=@smoke test_app/ --color --no-capture --no-skipped -f behave_html_formatter:HTMLFormatter

Special characters:
éáűőúöüóí

Case 1:
Formatter cannot handle propery characters above, on output only � appears

Case 2:
On a special chase stream contains character: "→" write to file causing exception, check details bellow
HTML output does not created

self.stream.write(ET_tostring(self.html, pretty_print=True))
File "c:\program files (x86)\python38-32\lib\encodings\cp1250.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]

Exception UnicodeEncodeError: 'charmap' codec can't encode character '\u2192' in position 47003: character maps to
Traceback (most recent call last):```
@bittner
Copy link
Member

bittner commented Jul 13, 2021

I've taken a quick peek into the code and saw that we're setting the character encoding via meta Content-Type.

Could you verify what the resulting HTML looks like on the top? It should:

  • Start with a <!DOCTYPE html> to declare being HTML5,
  • Have a valid Content-Type declaration (according to what I know from my experience with the W3 validator there should, e.g., be a space between text/html; and charset=utf-8).

Personally, I prefer <meta charset="utf-8"/> over using <meta http-equiv="Content-Type", which is less verbose.

This issue might be related to #2. You might want to run HTML validation over your generated document. If you want to create a PR for appropriate fixes then, that would be very much appreciated!

The output of xml.etree.ElementTree is generally not very HTML5-friendly, we need to make the best out of it or find a more appropriate solution for generating HTML5-compliant documents.

@y4rk4s
Copy link
Author

y4rk4s commented Jul 13, 2021

Current HTML source ( generated 07/13/2021 ):


--
  |  <!DOCTYPE HTML>
  | <html>
  | <head>
  | <title>Behave Test Report</title>
  | <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
  | <style type="text/css">
  | <!--

<!--EndFragment-->
</body>
</html>

Issue like:

Body:
"Forgat�k�nyv"
Instead of
"Forgatókönyv"

Forgatókönyv is translation of Scenario: Scenario

Note:
If removed meta from html, output was correct

@bittner
Copy link
Member

bittner commented Jul 13, 2021

If removed meta from html, output was correct

Interesting!

I assume when you replace the meta tag by <meta charset="utf-8"/> that won't improve the situation either, will it?

Hence, the generated content is probably not written in UTF-8.

Related Code

@jenisys Do you know of any issues with character encoding in the Formatter class?

@jenisys
Copy link
Member

jenisys commented Dec 12, 2021

Sorry, for the late reply.

Unicode encoding causes different problems with different Python versions
(Python2 differs from Python3 and what the behavior is).
This means between ASCII (and extended charset) and Unicode.
As well if you need to transcode between Unicode encodings.

NOTES:

  • GUIDELINE: You should have the least problems if you use UTF-8
  • Your traceback suggests that you are using: cp1250 and not UTF-8
    File "c:\program files (x86)\python38-32\lib\encodings\cp1250.py", line 19, in encode
  • Use Microsoft WindowsTerminal (available over: Microsoft Store) or via winget (I think)
    AFAIK: It supports Unicode / UTF-8
  • behave installs win_unicode_console for Python < 3.6 (but: You have 3.8 or 3.6 before)

OTHERWISE:
@y4rk4s What kind of behave version are you using: 0.98 (really) ?
There is no such. behave version (AFAIK).

@jenisys jenisys changed the title behave_html_formatter:HTMLFormatter UTF-8 issues behave_html_formatter:HTMLFormatter: Unicode/UTF-8 issues Dec 12, 2021
@y4rk4s
Copy link
Author

y4rk4s commented Jan 3, 2022

Thank you for reply
python version is 3.8.2
behave version is 1.2.6
behave-html-formatter version is 0.98

We forced to use powershell, with PYTHONIOENCODING UTF-8 value, terminal output is fine but, only got issues in the saved HTML

@jenisys
Copy link
Member

jenisys commented Jan 3, 2022

HTML formatter works fine for me with special non-ASCII characters (Sonderzeichen).

I am using:

python version is 3.9 (on platform: macOS)
behave version is 1.2.6
behave-html-formatter version is 0.9.8 (would you probably meant above)

I run (using a feature-file w/ French language keywords, using accents, etc.):

$ behave -f html -o behave.html -f pretty tools/test-features/french.feature
Fonctionnalité: testing stuff # tools/test-features/french.feature:2

  Scénario: test stuff             # tools/test-features/french.feature:4
    Etant donné I am testing stuff # tools/test-features/steps/steps.py:11 0.000s
    Quand I exercise it work       # tools/test-features/steps/steps.py:24 0.000s
    Alors it will work             # tools/test-features/steps/steps.py:29 0.000s

  Scénario: test more stuff        # tools/test-features/french.feature:9
    Etant donné I am testing stuff # tools/test-features/steps/steps.py:11 0.000s
    Alors it will work             # tools/test-features/steps/steps.py:29 0.000s

1 feature passed, 0 failed, 0 skipped
2 scenarios passed, 0 failed, 0 skipped
5 steps passed, 0 failed, 0 skipped, 0 undefined

The HTML report shows:
image

@y4rk4s NEXT STEPS:

  • Try to replicate my results with the existing feature file (mentioned above).
  • As noted above your encoding is not "UTF-8" but "cp1250" from your stack trace above.
  • Try to adapt the "ET_tostring()" function that @bittner mentioned above.
    It uses the encoding "unicode" for python.version >= 3.0.
    I changed it to "UTF-8" without significant effects. Just the embedded style sheet is rendered differently and some HTML attributes have another ordering.
  • Try to patch behave.textutil.select_best_encoding(). Determine which encoding is used in your case.
    Try to force it to "UTF-8" if this is not the case (for example).
    NOTE: This encoding is then used by the behave.formatter.base.StringOpener class via the Formatter base class that is used by the HTMLFormatter class.
  • Try to install the newest "behave" from the Github repository and check if you get any improvements.

@y4rk4s
Copy link
Author

y4rk4s commented Jan 5, 2022

i forgot mention, i working under Windows 10 , i tried followings with this command:
behave -f behave_html_formatter:HTMLFormatter -o behave.html -f pretty french.feature And i had also change "Functionalité" to "Feature" because it dont recognized ( step not found~)

  • i tried to upgrade behave, and behave-html-formatter - everything is up-to-date

  • i tried to run command above with cmd, and powershell

  • i tried to force powershell encoding with various values ( utf-8, cp1250 )

  • in terminals output was fine

  • pretty.output was also fine

  • the saved html was in ANSI

  • a also tried to change feature file encoding - same results

  • changed in html does not works ( saved to ANSI )

  • case when i modify formatter code package worked for us

image

@y4rk4s
Copy link
Author

y4rk4s commented Jan 11, 2022

Modified html.py line by the following

{"http-equiv": "Content-Type"},
Now its working fine.

@bittner
Copy link
Member

bittner commented Jun 22, 2022

This problem likely still needs to be fixed.

Related code: html.py:128-135

@y4rk4s
Copy link
Author

y4rk4s commented Jun 24, 2022

Thank you for the whole project and the hard work.
Currently we are still using a modified version of your code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants