Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode, e.g. crescent moon 🌙 & beers 🍻 breaks under Python 3 #159

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Unicode, e.g. crescent moon 🌙 & beers 🍻 breaks under Python 3 #159

wants to merge 6 commits into from

Conversation

pzrq
Copy link
Contributor

@pzrq pzrq commented Jan 21, 2016

As suggested by @peterbe on #157, please see for more details.

Not sure how to proceed so if anyone can make the test pass, go for it and good on you 👍

@pzrq
Copy link
Contributor Author

pzrq commented Jan 21, 2016

@peterbe Now I get what you mean by feels like an environment issue.

On my OSX El Capitan dev machine:

(premailer)pzrq@Peters-Mac-mini:~/Projects/premailer$ python
Python 2.7.10 (default, Aug 22 2015, 20:33:39) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml import etree
>>> etree.LXML_VERSION
(3, 5, 0, 0)
>>> etree.LIBXML_VERSION
(2, 9, 2)
>>> etree.LIBXML_COMPILED_VERSION
(2, 9, 2)
>>> etree.LIBXSLT_VERSION
(1, 1, 28)
>>> etree.LIBXSLT_COMPILED_VERSION
(1, 1, 28)
>>> 
(premailer_py3)pzrq@Peters-Mac-mini:~/Projects/premailer$ python
Python 3.5.0 (default, Sep 23 2015, 04:41:38) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.72)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml import etree
>>> etree.LXML_VERSION
(3, 5, 0, 0)
>>> etree.LIBXML_VERSION
(2, 9, 2)
>>> etree.LIBXML_COMPILED_VERSION
(2, 9, 2)
>>> etree.LIBXSLT_VERSION
(1, 1, 28)
>>> etree.LIBXSLT_COMPILED_VERSION
(1, 1, 28)

The part I don't yet understand is the staging environment which triggered all this is an Ubuntu box...

@peterbe
Copy link
Owner

peterbe commented Jan 27, 2016

Sorry I haven't had time to review. Still on my todo list.

pzrq added 3 commits January 28, 2016 20:02
To see if it's reproducible on Travis which if true may make whatever the underlying issue(s) are easier to debug.
https://docs.travis-ci.com/user/multi-os/
To see if only Python 3.5 fails at the building environment under OSX stage and we can perhaps get 3.3 or 3.4 out.

However this is not very inspiring...
apache/libcloud@95338d8
@pzrq
Copy link
Contributor Author

pzrq commented Jan 28, 2016

@peterbe No worries, it looks to be OSX-specific at this point.

I was able to reproduce the test failure in this PR on a different OSX El Capitan box but not on an Ubuntu 14.04 LTS box suggesting it is OSX-specific, or at least under OSX building lxml.etree or one of it's dependencies (IIRC libxml2, libxslt) is done in such a way that this can happen.

The original original issue which triggered this appears to be a completely different Django issue with the console backend, which explains why it failed on the staging environment but emails with 🌙 🍻 in production are sending without issue.

Stacktrace (most recent call last):

  File "celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "newrelic/hooks/application_celery.py", line 66, in wrapper
    return wrapped(*args, **kwargs)
  File "celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)

{removed_frames_which_just_create_and_send_an_email, i.e.
    message = EmailMultiAlternatives(
        subject,
        body=text_content,
        to=new_recipients,
        from_email=from_email
    )
    message.attach_alternative(html_content, 'text/html')
    message.send()
}
  File "newrelic/api/function_trace.py", line 110, in literal_wrapper
    return wrapped(*args, **kwargs)
  File "django/core/mail/message.py", line 303, in send
    return self.get_connection(fail_silently).send_messages([self])
  File "django/core/mail/backends/console.py", line 36, in send_messages
    self.write_message(message)
  File "django/core/mail/backends/console.py", line 23, in write_message
    self.stream.write('%s\n' % msg_data)

From Sentry, formatted for readability:

Task {removed_task_path}[c2b3b243-6e57-4122-9200-ac5cda6bdec8] raised unexpected:
UnicodeEncodeError(
    'ascii',
    'Content-Type: multipart/alternative;\n
        boundary="===============8707286802097042810=="\n
        MIME-Version: 1.0\n
        Subject: You have 1 task due tomorrow\n
        From: {removed_from_email}\n
        To: {removed_to_email}\n
        Date: Thu, 14 Jan 2016 17:00:24 -0000\n
        Message-ID: {removed_message_id}\n
        \n
        --===============8707286802097042810==\n
        MIME-Version: 1.0\n
        Content-Type: text/plain; charset="utf-8"\n
        Content-Transfer-Encoding: 7bit\n
        \n
        To view this message, please use an HTML-compatible email viewer.\n
        --===============8707286802097042810==\n
        Content-Type: text/html; charset="utf-8"\n
        MIME-Version: 1.0\n
        Content-Transfer-Encoding: 8bit\n
        \n
        <html xmlns="http://www.w3.org/1999/xhtml">\n
        <head>\n<meta content="text/html; charset=utf-8" http-equiv="Content-Type">\n
        <title></title>\n
        <style {removed_style_info}
        Dear 🌙,
        {removed_email_body}
        </body>\n
        </html>\n
        \n
        --===============8707286802097042810==--\n
        \n',
    2874,  # 🌙 is character in the "message[2874]"
    2875,
    'ordinal not in range(128)'
)

@peterbe
Copy link
Owner

peterbe commented Jan 29, 2016

I'm trying to wrap my head around what's going on.

I checked out your branch and verified that all is well, under python 2.7. By the way, I have OSX too.

(premailer):~/dev/PYTHON/premailer (mathspace-unicode-crescent-moon)$ python
Python 2.7.10 (default, Aug 26 2015, 15:56:25)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> html = u'''<html xmlns="http://www.w3.org/1999/xhtml">  <body>      Dear Skyelar 🌙, </body> </html>'''
>>> from premailer import Premailer
>>> p=Premailer(html)
>>> nhtml=p.transform()
>>> print nhtml
<html xmlns="http://www.w3.org/1999/xhtml">  <head></head>
<body>      Dear Skyelar 🌙, </body> </html>

Cool. Then I created a fresh virtualenv based on Python 3.5. It installed:

cssselect==0.9.1
cssutils==1.0.1
lxml==3.5.0

Yeah, there it gets weird:

(premailer-py3.5) :~/dev/PYTHON/premailer (mathspace-unicode-crescent-moon)$ python
Python 3.5.1 (default, Jan 20 2016, 12:32:13)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from premailer import Premailer
>>> html ='''<html xmlns="http://www.w3.org/1999/xhtml">  <body>      Dear Skyelar 🌙, </body> </html>'''
>>> p=Premailer(html)
>>> nhtml=p.transform()
>>> print(nhtml)
<html>
<head></head>
<body><p>h   t   m   l       x   m   l   n   s   =   "   h   t   t   p   :   /   /   w   w   w   .   w   3   .   o   r   g   /   1   9   9   9   /   x   h   t   m   l   "   &gt;           </p></body>
</html>

So, are you saying that this error NOT happen if you run Python 3 on Linux?
At the moment, I don't have a working Python 3 on a Linux. (if time allowed I'd spin up a fresh and temporary EC2 Ubuntu image).

So the problem only manifests itself on OSX, in Python 3x? Is that right?

@peterbe
Copy link
Owner

peterbe commented Jan 29, 2016

Another thing I don't understand is that, this works on the master branch:

(premailer):~/dev/PYTHON/premailer (master %)$ python
Python 2.7.10 (default, Aug 26 2015, 15:56:25)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import premailer
>>> html = u'''<html xmlns="http://www.w3.org/1999/xhtml">  <body>      Dear Skyelar 🌙, </body> </html>'''
>>> p=premailer.Premailer(html)
>>> nhtml=p.transform()
>>> print nhtml
<html xmlns="http://www.w3.org/1999/xhtml">  <head></head>
<body>      Dear Skyelar 🌙, </body> </html>

@almost
Copy link

almost commented Mar 16, 2022

Did anyone ever work out what's going on here? I've encountered this after moving to a new laptop, both old and new laptops are both running the newest MacOS and all the same library versions (installed with Poetry so there's a lock file).

I found that passing through UTF-8 encoded bytes to etree.fromstring works fine. And since premailer allows a preparsed lxml tree to be passed in I can do that. But I just hate not understanding what's going wrong!

@peterbe
Copy link
Owner

peterbe commented Mar 18, 2022

Did anyone ever work out what's going on here? I've encountered this after moving to a new laptop, both old and new laptops are both running the newest MacOS and all the same library versions (installed with Poetry so there's a lock file).

I found that passing through UTF-8 encoded bytes to etree.fromstring works fine. And since premailer allows a preparsed lxml tree to be passed in I can do that. But I just hate not understanding what's going wrong!

So it's still a bug after 6 years of Python and lxml life?
@almost would you mind taking a stab at the looking at the diff in this PR and copying the new test into the new tests because when this PR was created it was created with TravisCI changes in mind but since then, the testing has moved to GItHub Actions.

@peterbe
Copy link
Owner

peterbe commented Mar 18, 2022

@almost would you mind taking a stab at the looking at the diff in this PR and copying the new test into the new tests because when this PR was created it was created with TravisCI changes in mind but since then, the testing has moved to GItHub Actions.

Well, I guess GitHub Actions just runs tox and tox ultimately runs nosetests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants