Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-UTF-8 character sets #18

Merged
merged 4 commits into from
Feb 28, 2018

Conversation

StefansM
Copy link
Contributor

This commit modifies cpipes to use iconv to convert the internally-encoded pipe character lists, which are in UTF-8, to the user's locale (#9). As a bonus, it includes functions to convert from the user's locale into UTF-8, so it is now trivial to add custom lists of characters.

The implementation details are described in the comment at the head of pipes.c, so I won't re-hash them here.

As an example of this in action, here is the previous version (before this commit) running in urxvt in a GB18030 locale:
gb18030-broken
Very matrix-esque, but not what we're looking for.

After this commit, it looks like this, as intended:
gb18030-working

There are some problems with this, which I think are mostly caused by multi-column characters. For example, xterm is still quite broken:
gb18030-xterm-broken

Interestingly, increasing the number of pipes fixes some of the alignment problems in xterm:
gb18030-xterm-less-broken
This leads me to think it may be related to ncurses' local move optimisations, as having two pipes will mean ncurses prefers to use cup rather than local moves like \b\b\b. I don't know whether the fault lies in ncurses (unlikely), terminfo (possible) or xterm (possible), and I don't care nearly enough to find out.

This commit also adds a unit test for the locale conversion. I've included an update Travis CI config that runs the unit test using prove as a test runner, which should be present on most dev machines.

This commit introduces character handling for locales other than UTF-8,
including support for multicolumn characters.
Also update Travis CI config to use the new test.
Use POSIX-2008 features consistently, rather than defining feature test
macros piecemeal across source files.
OSX does not include iconv in the C library, so we need to explicitly
find it.
@livibetter
Copy link
Contributor

I found the results from non-UTF8-unsupported st and tmux interesting:

  • xterm/tmux:

    2018-02-26--10 40 32
    (strangely, it looks like a maze game.)

  • st:

    2018-02-26--10 40 45

  • st/tmux:

    2018-02-26--10 40 52


Nice work by the way.

@StefansM
Copy link
Contributor Author

StefansM commented Feb 28, 2018

@livibetter, it looks kind of cool -- definitely going to need to add the option for custom pipe characters. Happily, that's made much easier by this PR.

I'm not fully satisfied with this commit for a couple of reasons:

  1. The crude character parser doesn't understand combining codepoints or zero-width codepoints.

  2. It's broken on xterm in non-UTF-8 locales.

Point 1 isn't really an issue: this isn't the next LaTeX, so having some constraints on the allowed input is acceptable.

Point 2 is more frustrating, but in practice anyone using a non-UTF-8 locale is going to be used to weird character-encoding errors. I'm not even sure if xterm is supposed to support non-UTF-8 locales. After a bit of time narrowing down the problem, I discovered that xterm resizes each character cell after it has been written to by a multi-column character.

So, if on line one you write some multi-column characters and do nothing to line two, the column numbering looks like this:

1 2 3 4 5...
123456789...

Then, if you write multi-column chars to line 2, it resizes the cells!

1 2 3 4 5...
1 2 3 4 5...

It might be possible to hack around this, but I don't think it's possible to do so in a portable manner, unless there is a weird_multicolumn_resizing terminfo entry I'm missing. I think it's more likely that xterm just doesn't support non-UTF-8 locales, though I can't seem to find a definitive yes or no.

TL;DR: I'm merging this :)

@StefansM StefansM merged commit 53fb4a0 into pipeseroni:master Feb 28, 2018
@StefansM StefansM deleted the feature-unicode branch February 28, 2018 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants