-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GAP potentially puts linebreaks between the bytes forming a UTF-8 character #5544
Comments
Technically, I don't think GAP promises to use UTF-8 -- someone could be using Latin-1 for example. So, there are various things to decide -- do we want to changing printing based on terminal config, or just decide nowadays everyone wants UTF-8? |
Just some ideas: Maybe an efficient solution could be to not insert linebreaks at all if a string contains any characters outside of the range of printable ASCII characters. Or a partial solution could maybe restrict linebreaks to be inserted only between printable ASCII characters. But maybe that would lead to too many inconsistencies :/ |
This has reminded me of a PR I never got around to finishing (I've just looked at resurrecting it, will need some poking): This disables GAP's linebreaks entirely (the reason this is a bit less trivial than you might think is GAP combines line breaks with indendation -- personal I never want GAP to line break, but always want it to indent). I'm going to work on polishing it up over the next few days, then we can see if it would solve this problem, and maybe write some docs for it. |
Ah, I wasn't aware of that PR. I like the idea very much, this would also solve other issues I have. |
I have now updated #5140 , so it applies to master and has some basic documentation. You should be able to run I'd be interested if this seems to handle UTF-8 well, or if there is some unexpected issues |
Very nice, thanks a lot! I just tried out the PR: In a terminal I do not see problems with UTF-8 characters anymore :-) In a |
Consider the following situation:
Observed behaviour
GAP puts a linebreak between the bytes forming the UTF-8 character
→
.In particular, if this happens inside the output in a
.tst
file, the file is not a valid UTF-8 file anymore.Expected behaviour
The linebreak is inserted before or after the UTF-8 character.
I expect that this is a known bug, but I could not find an open issue for this.
Copy and paste GAP banner (to tell us about your setup)
The text was updated successfully, but these errors were encountered: