Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

literal[*] does not work with Unicode characters #91

Open
loudaslife opened this issue Nov 21, 2016 · 5 comments
Open

literal[*] does not work with Unicode characters #91

loudaslife opened this issue Nov 21, 2016 · 5 comments

Comments

@loudaslife
Copy link

Using the literal[*] command with a trigger consisting of a Unicode character will create a .txt file link that returns a 404 error. Reproduced on both Firefox and Chromium.

<loudaslife> literal[*] ☃
<Bucket> loudaslife: Here's the full list (3): http://carabiner.peeron.com/xkcd/bucket/literal_%E2%98%83.txt

Firefox and Chromium both resolve %E2%98%83 to in the address bar automatically, but the 404 page contains a URL with completely different characters.

The requested URL /xkcd/bucket/literal_☃.txt was not found on this server.

This is also reproducible with non-snowman characters, like .

@dgw
Copy link
Collaborator

dgw commented Nov 21, 2016

I suspect this is down to the configuration of the webserver that handles carabiner.peeron.com, rather than an issue with Bucket itself. If the link Bucket generates resolves to the correct character, then the problem is with the webserver's interpretation of it.

@loudaslife
Copy link
Author

loudaslife commented Nov 21, 2016

Some discussion in #xkcd suggests that bucket's logs actually have the same encoding error. One of the Ops copied and pasted a snippet of the log:

<barometz> loudaslife asked in loudaslife to dump out âH^

âH^ is the same incorrect parsing that the 404 page gives when you ask for .

EDIT: I actually just realized that âH^ is NOT the same incorrect parsing as before, it's completely different than either of the characters I tried. So either bucket's log transcoding problem is different than that of the webserver, or something was messed up on barometz's end.

@loudaslife
Copy link
Author

http://string-functions.com/encodingerror.aspx is a nifty little tool, and it determines that the webserver problem is UTF-8 being read as Windows-1252. It does not come up with a possible encoding error for the âH^ bucket log string that was pasted in #xkcd.

@dgw
Copy link
Collaborator

dgw commented Nov 21, 2016

Text encoding is a real bane… Would be interesting to know if barometz got that log line from Bucket's log file on the server or from their own client. That's the sort of thing I see happen a lot to HexChat users who haven't configured their setup correctly (charset/fonts). That really is a nifty little tool, but I had the same lack of results you did with finding a path from either or ☃ to âH^.

I added a test factoid to my own instance and generated a literal dump. Same issue. Carabiner uses Apache2, and I'm using nginx, so I actually don't think it's the webserver. ls shows a pretty useless filename (literal_â??.txt). vi literal_â��.txt (the filename that results from pasting â�� and tabbing) says I'm editing "literal_â<98><83>.txt".

At this point, it seems that I was wrong before about it not being a Bucket issue. I did vi literal_☃.txt in the literal directory and ended up with a [New File]. Adding text and saving, then visiting the link that Bucket generated, showed the file just fine in my browser—no 404 error any more.

More investigation into Bucket's order of operations is warranted, but for now that's what I've found.

@dgw
Copy link
Collaborator

dgw commented Nov 21, 2016

Furthermore, I just played around in Perl's debugger mode performing a trivial open-and-append on a file named literal_☃.txt, with no issues. It even shows up correctly in ls output, unlike the one generated by Bucket.

Curiouser and curiouser… It's probably worth pointing out that Perl's documentation says, "There are still several places where Unicode isn't fully supported, such as in filenames." (perlunicode docs) … But if it works in the debugger, shouldn't it work in Bucket?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants