Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

08-screen.t fails with PERL_UNICODE and 5.22.1 #32

Open
autarch opened this issue Nov 14, 2016 · 11 comments · May be fixed by #68
Open

08-screen.t fails with PERL_UNICODE and 5.22.1 #32

autarch opened this issue Nov 14, 2016 · 11 comments · May be fixed by #68

Comments

@autarch
Copy link
Member

autarch commented Nov 14, 2016

Migrated from rt.cpan.org #111143 (status was 'open')

Requestors:

Attachments:

From [email protected] (@madsen) on 2016-01-13 05:38:40:

With Perl 5.22.1, LANG=en_US.UTF-8, and PERL_UNICODE=SAL, 08-screen.t fails. It succeeds if I set either LANG=C or PERL_UNICODE=0.

With Perl 5.20.2, the test passes even with PERL_UNICODE=SAL.

I've attached the result of running

perl5.22.1 t/00-report-prereqs.t &>/tmp/00-report-prereqs.txt

along with the results of 00-report-prereqs.t for both 5.20.2 and 5.22.1.

@autarch
Copy link
Member Author

autarch commented Nov 14, 2016

From [email protected] (@madsen) on 2016-01-13 05:39:29:

1..1
# 
# Versions for all modules listed in static metadata (including optional ones):
# 
# === Configure Requires ===
# 
#     Module               Want Have
#     -------------------- ---- ----
#     Dist::CheckConflicts 0.02 0.11
#     ExtUtils::MakeMaker   any 6.98
# 
# === Test Requires ===
# 
#     Module              Want     Have
#     ------------------- ---- --------
#     Data::Dumper         any    2.154
#     Exporter             any     5.71
#     ExtUtils::MakeMaker  any     6.98
#     File::Spec           any  3.48_01
#     File::Temp           any   0.2304
#     FindBin              any     1.51
#     Getopt::Long         any     2.42
#     IO::File             any     1.16
#     IPC::Run3            any    0.048
#     PerlIO               any     1.09
#     Test::Fatal          any    0.014
#     Test::More          0.96 1.001014
#     Test::Requires       any     0.08
#     lib                  any     0.63
#     utf8                 any  1.13_01
# 
# === Test Recommends ===
# 
#     Module         Want     Have
#     ---------- -------- --------
#     CPAN::Meta 2.120900 2.143240
# 
# === Runtime Requires ===
# 
#     Module                   Want   Have
#     ------------------------ ---- ------
#     Carp                      any 1.3301
#     Devel::GlobalDestruction  any   0.13
#     Dist::CheckConflicts     0.02   0.11
#     Encode                    any   2.70
#     Fcntl                     any   1.11
#     IO::Handle                any   1.35
#     Module::Runtime           any  0.014
#     Params::Validate         1.03   1.18
#     Scalar::Util              any   1.41
#     Sys::Syslog              0.28   0.33
#     base                      any   2.22
#     strict                    any   1.08
#     warnings                  any   1.23
# 
ok 1

@autarch
Copy link
Member Author

autarch commented Nov 14, 2016

@autarch
Copy link
Member Author

autarch commented Nov 14, 2016

From [email protected] (@madsen) on 2016-01-13 06:09:39:

On Wed, Jan 13, 2016 12:00:55 AM, ETHER wrote:

relevant: http://www.dagolden.com/index.php/1771/why-perl_unicode-makes-me-sad/

Possibly, although I don't use D, just S(tandard IO handles) and A(rgv). (L means that SA applies only if the locale is UTF-8, which is why LANG=C makes the test pass.)

@autarch
Copy link
Member Author

autarch commented Nov 14, 2016

From [email protected] (@autarch) on 2016-01-13 06:16:27:

On Wed Jan 13 01:09:39 2016, CJM wrote:

On Wed, Jan 13, 2016 12:00:55 AM, ETHER wrote:

relevant: http://www.dagolden.com/index.php/1771/why-perl_unicode-
makes-me-sad/

Possibly, although I don't use D, just S(tandard IO handles) and
A(rgv). (L means that SA applies only if the locale is UTF-8, which
is why LANG=C makes the test pass.)

It'd probably the S causing the problem.

That said, I don't think running arbitrary tests with env vars that change the Perl interpreter's behavior should be expected to work.

@autarch
Copy link
Member Author

autarch commented Nov 14, 2016

From [email protected] (@madsen) on 2016-01-13 06:26:59:

As xdg mentions in that article, it's useful to test modules the way they'll be used.

I'm of the opinion that a test should clean up its environment if it's doing something tricky that breaks if certain variables are set. In this case, the attached patch will work.

@autarch
Copy link
Member Author

autarch commented Nov 14, 2016

From [email protected] (@dagolden) on 2016-01-13 15:08:48:

On Wed Jan 13 01:16:27 2016, DROLSKY wrote:

That said, I don't think running arbitrary tests with env vars that
change the Perl interpreter's behavior should be expected to work.

People put all sorts of stuff in their environment, so tests ought to reset any that cause problems. This is no different than PERL5LIB, PERL5OPT, etc. Or setting up a temporary HOME directly to avoid automatically picking up config files.

PERL_UNICODE is just a particularly uncommon one. :-/

@autarch
Copy link
Member Author

autarch commented Nov 14, 2016

From [email protected] (@autarch) on 2016-01-13 15:11:05:

On Wed Jan 13 10:08:48 2016, DAGOLDEN wrote:

On Wed Jan 13 01:16:27 2016, DROLSKY wrote:

That said, I don't think running arbitrary tests with env vars that
change the Perl interpreter's behavior should be expected to work.

People put all sorts of stuff in their environment, so tests ought to
reset any that cause problems. This is no different than PERL5LIB,
PERL5OPT, etc. Or setting up a temporary HOME directly to avoid
automatically picking up config files.

PERL_UNICODE is just a particularly uncommon one. :-/

This is something that should be done in Test::More or something like that, then. Do we really expect every CPAN module to add boilerplate to every test to reset every possible env var that someone might set?

I'm of course fine with patching this test now that we know the issue and the simple fix, but as a general rule I think that if you run tests with random perl env vars set, you should expect random failures.

@autarch
Copy link
Member Author

autarch commented Nov 14, 2016

From [email protected] (@karenetheridge) on 2016-01-13 17:50:36:

On 2016-01-13 07:11:05, DROLSKY wrote:

This is something that should be done in Test::More or something like
that, then. Do we really expect every CPAN module to add boilerplate
to every test to reset every possible env var that someone might set?

I'm of course fine with patching this test now that we know the issue
and the simple fix, but as a general rule I think that if you run
tests with random perl env vars set, you should expect random
failures.

Having Test::More reset the variables entirely defeats the point of
being able to test a CPAN module working with particular variables set --
which is something an application may wish to do (providing, of course,
that all the CPAN modules it uses actually does work with those variables,
which is what this patch is about).

This isn't a random variable -- it's one that affects the behaviour of
filehandle operations -- and so it's not unreasonable to try to make this
module work when the variable is set to various values.

I see it as similar to making a module work in various locales, or with
different filesystems. It may be an implementation variant that the author
hadn't thought of, or he doesn't have configured by default on his development
system, but that's why there are smokers, as well as a large contingent of
helpful users that send patches when edge cases are discovered :D

(I'm also now pondering which of my modules do interesting things with filehandles
that might need testing with PERL_UNICODE...)

@autarch
Copy link
Member Author

autarch commented Nov 14, 2016

From [email protected] (@autarch) on 2016-01-13 21:16:21:

On Wed Jan 13 12:50:36 2016, ETHER wrote:

Having Test::More reset the variables entirely defeats the point of being able to test a CPAN module working with particular variables set -- which is something an application may wish to do (providing, of
course, that all the CPAN modules it uses actually does work with those
variables, which is what this patch is about).

Good point, except that this patch explicitly breaks your ability to test this module with the env var in question! In fact, it will mislead you into thinking that this will work when it fact it will not.

This isn't a random variable -- it's one that affects the behaviour of filehandle operations -- and so it's not unreasonable to try to make this module work when the variable is set to various values.

There's a limit to how many different environments a module can be expected to work in. An environment that globally messes with every single filehandle is likely to break a lot of CPAN modules. This seems like one of those "use at your own risk or with lots of testing things".

That said, if someone wants to patch the module (not the test) to make it actually do the right thing here, I would happily apply such a patch. I'm not sure exactly what the right thing is or why it breaks though. I'm guessing there some sort of double encoding issue going on.

Also, this is testing a "utf8" option that you would want to not pass if you set this global env var in all your code. So maybe just a doc patch?

@autarch
Copy link
Member Author

autarch commented Nov 14, 2016

From [email protected] (@mauke) on 2016-02-27 16:52:21:

On Wed Jan 13 16:16:21 2016, DROLSKY wrote:

On Wed Jan 13 12:50:36 2016, ETHER wrote:

Having Test::More reset the variables entirely defeats the point of being able to test a CPAN module working with particular variables set -- which is something an application may wish to do (providing, of
course, that all the CPAN modules it uses actually does work with those variables, which is what this patch is about).

Good point, except that this patch explicitly breaks your ability to test this module with the env var in question! In fact, it will mislead you into thinking that this will work when it fact it will not.

This isn't a random variable -- it's one that affects the behaviour of filehandle operations -- and so it's not unreasonable to try to make this module work when the variable is set to various values.

There's a limit to how many different environments a module can be expected to work in. An environment that globally messes with every single filehandle is likely to break a lot of CPAN modules. This seems like one of those "use at your own risk or with lots of testing things".

That said, if someone wants to patch the module (not the test) to make it actually do the right thing here, I would happily apply such a patch. I'm not sure exactly what the right thing is or why it breaks though. I'm guessing there some sort of double encoding issue going on.

Also, this is testing a "utf8" option that you would want to not pass if you set this global env var in all your code. So maybe just a doc patch?

I don't know about this problem but there's a kind of similar bug in DBI:
https://rt.cpan.org/Ticket/Display.html?id=71341

That one is caused by sending binary data over STDIN/STDOUT without calling binmode() on them. Which breaks with PERL_UNICODE=S because that makes the STD* handles start out as ":utf8".

mauke added a commit to mauke/Log-Dispatch that referenced this issue Apr 6, 2023
The STD{IN,OUT,ERR} default handles start out in some unspecified
(platform-specific) text encoding. The 'utf8' option of
Log::Dispatch::Screen manually encodes the logged messages as bytes in
UTF-8 format before writing them to the output handle. Thus, by default,
the UTF-8 bytes get re-encoded by whatever the native text encoding is
on the platform.

The correct way to handle this is to either not set 'utf8' (and rely on
the encoding layer of the handle) or set 'utf8' and call binmode() on
the handle (or otherwise apply a ':raw' layer) to ensure that bytes are
written as is (without further re-encoding).

Fixes houseabsolute#32.
@mauke mauke linked a pull request Apr 6, 2023 that will close this issue
@mauke
Copy link

mauke commented Apr 18, 2024

Eight years later ...

Log::Dispatch tests are still failing (and writing mojibake to the screen):

t/binmode.t ................................ ok
t/close-after-write.t ...................... ok
t/email-exit.t ............................. skipped: This test only runs for the maintainer
t/file-locked.t ............................ ok
t/lazy-open.t .............................. ok

    #   Failed test 'got expected stdout from Screen output'
    #   at t/screen.t line 95.
    #          got: 'test message - á½ '
    #     expected: 'test message - á½ '
    # Looks like you failed 1 test of 2.

#   Failed test 'stderr = 0, utf8 = 1'
#   at t/screen.t line 104.

    #   Failed test 'got expected stderr from Screen output'
    #   at t/screen.t line 101.
    #          got: 'test message - á½ '
    #     expected: 'test message - á½ '
    # Looks like you failed 1 test of 2.

#   Failed test 'stderr = 1, utf8 = 1'
#   at t/screen.t line 104.
# Looks like you failed 2 tests of 5.
t/screen.t ................................. 
Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/5 subtests 
t/short-syntax.t ........................... ok
t/syslog-lock-without-preloaded-threads.t .. ok
t/syslog-threads.t ......................... ok
t/syslog.t ................................. ok

Test Summary Report
-------------------
t/screen.t                               (Wstat: 512 (exited 2) Tests: 5 Failed: 2)
  Failed tests:  4-5
  Non-zero exit status: 2
Files=13, Tests=84,  4 wallclock secs ( 0.05 usr  0.01 sys +  2.82 cusr  1.31 csys =  4.19 CPU)
Result: FAIL
Failed 1/13 test programs. 2/84 subtests failed.

The underlying cause of the problem is now understood: The test writes binary data to a file handle (STDOUT/STDERR) without switching it to binary mode first (with binmode()), which implicitly leaves the handle in text mode. Thus (re-)encoding happens on some platforms.

Meanwhile, the DBI issue I mentioned above did turn out to be a very similar problem. A patch adding the missing binmode() calls was applied (perl5-dbi/dbi#32) and has been working fine since DBI 1.635 (released in April 2016).


That said, if someone wants to patch the module (not the test) to make it actually do the right thing here, I would happily apply such a patch. I'm not sure exactly what the right thing is or why it breaks though. I'm guessing there some sort of double encoding issue going on.

Hard to do without breaking the existing documented behavior:

  • utf8 (0 or 1)

    If this is true, then the output will be encoded using the UTF-8 encoding. If you have already applied an encoding layer to the relevant filehandle, "STDOUT" or "STDERR", then your output will end up double-encoded if this is true.

    This defaults to false.

Morally, the right thing to do is to call binmode(STDOUT) or binmode(STDERR) in Log::Dispatch::Screen if utf8 mode is enabled because then the module tries to write binary data to a handle, so the handle better be in binary mode.

Also, this is testing a "utf8" option that you would want to not pass if you set this global env var in all your code. So maybe just a doc patch?

It's an environment variable, so people set it in their environment, not in code. In particular, this issue affects third-party applications that use Log::Dispatch internally, not just my own code.

Would you be willing to accept a patch that calls binmode in Log::Dispatch::Screen? If not, I feel the right thing to do is document that setting utf8 => 1 with Log::Dispatch::Screen requires the programmer to call binmode manually beforehand in any code that uses the option (and then do just that in t/screen.t).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants