-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Arabic/bidirectional text #1093
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
InfoTeddy
suggested changes
Jan 4, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just writing down what I reviewed and discussed on Discord:
- U+FEFF is the Byte Order Mark and shouldn't be considered an Arabic character.
- Confusing wording in documentation: In "so the left border is 320 is the right border is 0", the second "is" should be "and". This mistake has been copy-pasted in several places.
PR_RTL_AXIS
should be renamed toPR_RTL_XFLIP
.- This warning:
/home/user/Programming/VVVVVV/desktop_version/src/FontBidi.cpp:186:1: warning: missing initializer for member ‘font::ArabicLetter::final’ [-Wmissing-field-initializers] 186 | }; | ^
- Menu options in the list of custom levels are not reversed, except for the last three options, despite the left/right keybindings still being reversed.
Just force-pushed a whole bunch of fixes and improvements - everything mentioned in the comment above and on Discord, as well as some other things:
|
This was referenced Jan 6, 2024
Closed
I'm going to give it a shot to use this for bidi text support, it looks like it's a pretty lightweight, compatible and low-dependency library which is definitely a plus. We'll still need to do reshaping ourselves, but that's the easy part compared to bidi.
Not much to see here, just making sure SheenBidi is compiled with the game and we can include its headers.
I'm now using SheenBidi to reorder RTL and bidirectional text properly at text rendering time! For Arabic this is still missing reshaping, but everything's looking really promising now! The code changes are really non-invasive. The changes to Font.cpp are absolutely minimal: 1305+ if (bidi_should_transform(text)) 1306+ { 1307+ text = bidi_transform(text); 1308+ } There's now a FontBidi.cpp, which implements these two functions, notably bidi_transform(), which takes a UTF-8 encoded string and returns another UTF-8 encoded string that has bidi reorderings and reshapings applied. In that function, SheenBidi gives us information about where in the input string runs start and end, and on a basic level, all we need to do there is to concatenate the parts together in the order that we're given them, and to reverse the RTL runs (recognizable by odd levels). As this is a proof-of-concept, bidi_should_transform() still always returns true, applying the bidi algorithm to all languages and all strings. I'm thinking of enabling bidi only when the language/font metadata enables RTL (which could be for the interface or for a custom level), or outside of that, at least when RTL characters are detected (such as Arabic or Hebrew Unicode blocks).
This adds the lookup table and an accompanying hashmap that will be used for reshaping Arabic - it's not yet used though.
This code can probably be polished a bit more, but the hard part is over! This part was written with guidance of this code: https://github.com/TerryCavanagh/hx_arabic_shaper
The Arabic part was made by Montassar Ghanmi, it didn't have its own Latin part so I just copied 00-FF from Space Station.
Montassar prepared a list of all the ligatures that needed to be supported, which was a simple A+B->C table, so that one was not too difficult either!
This now returns true if any of the characters in the text belong to the Arabic or Hebrew alphabet, or are one of the Unicode directional formatting characters. This is just so the bidi machinery doesn't have to run 100% of the time for 100% of the languages. I will also make it so the Arabic language pack, as well as custom levels, have an RTL attribute that always enables bidi (and does things like right-alignment in textboxes and other design-flipping)
They shouldn't be looked up in the font and displayed under any circumstances.
This doesn't have an effect yet, but it'll do things like right-alignment in textboxes and other design-flipping.
Again, the RTL property controls whether textboxes will be right-aligned, and that kind of stuff. It can't be font-bound, since Space Station supports Hebrew characters and we want to be able to support, say, a Hebrew translation or Hebrew levels in the future without having to make a dedicated (or duplicated) font for it. Therefore it's a property of both the language pack as well as custom levels - like custom levels already had a <font> tag, they now also have an <rtl> tag that sets this property. Right now, we'll have to hardcode it so the menu option for the Arabic font sets the <rtl> property to 1, and all the other options set it to 0. But it's future-proof in that we can later decide to split the option for Space Station into an LTR option and an RTL option (so both "english/..." and "עברית" would select Space Station, but one sets the RTL property to 0 and the other sets it to 1).
With the <font> tag (which doesn't indicate RTL-ness as explained), we've had a setfont(font) scripting command. Now we have an <rtl> tag, so we need a setrtl(on/off) command too to control that.
This will return if the given flags indicate RTL properties (such as textboxes being right-aligned).
Most of this diff is just moving some existing code around, and changing inline things to variables that could be changed more easily.
This lets you mirror the X axis specifically in RTL languages, so the left border is 320 and the right border is 0, and invert the meaning of PR_LEFT (0) and PR_RIGHT. Most of the time this is not necessary, it's just for stuff where a label is followed by a different print, like "Font: " followed by the font name, time trial time displays, etc
Okay, the "Font:" thing needed some local code after all, because both the interface font as well as the level font are used there. But it's good enough - all the other places can just use the flag. Notably, I also used this for the menus, since the existing ones are kinda LTR-oriented, and it's something that we don't *really* have to do, but I think it shows we care!
Instead of just up/down, you can also control menus with left/right. Which is illogical in Arabic... No big deal, I imagined this code to become much worse than it did. (And action sets is probably gonna refactor the whole thing anyway)
Stuff like centertext="1" and padtowidth="264" in cutscene translations looked wrong in RTL mode, both with Arabic and English text. For Arabic text, I could easily fix the problem by not counting the number of codepoints (and assuming they all have the same glyph width), but by instead taking the width of the string as reported for the font, and dividing it by the glyph width. This leaves English text still looking weird in RTL mode. But this shouldn't be a problem either: the Arabic translations will probably be in Arabic (where the problem doesn't happen), and I can get English text to show up fine by wrapping it in U+2066 LEFT-TO-RIGHT ISOLATE and U+2069 POP DIRECTIONAL ISOLATE. So it looks like an inherent quirk of bidi, that translators familiar with bidi can easily grasp and fix. This is main-game only functionality, so it shouldn't break existing custom levels. We should just make sure textboxes in other languages aren't broken, but from my testing, it's completely fine - in fact, it should've improved if it was broken.
I forgot to add the PR_RTL_XFLIP flag to these menu options, so they were always left-aligned, no matter what. What actually took me a bit to figure out was how to make the level completion stars work regardless of the contents of the title - the stars should always be to the left of the title in an LTR language, and always to the right of the title in an RTL language. Level titles can contain bidi characters regardless of the level's rtl flag being set, so I just let bidi handle all the level menu options, with some control characters to make sure everything always appears in the correct order.
The slider itself was getting mirrored, but not the labels (Low/Medium/High). This fixes that.
This has a lot of reading-orientation stuff on it like "Key: value", so easiest is to just flip the whole design of the screen rather than trying to flip individual strings.
Obvious "Key: value" things here, this one was easy.
If you copy-paste a newline character where it's not interpreted, such as in a level title, the print function wouldn't treat it any special. font::print_wrap() would, but that's not used here. However, now that bidi is involved, the newline is passed straight to SheenBidi which interprets it as a new line (which would need a new SBLine to be created, or maybe even a new SBParagraph if there's two). All while we're still treating it as a single line. This means the text would just stop being displayed after the first newline. This is now fixed by treating all newlines as spaces.
They're invisible in font::print(), but they were still considered characters with widths in the width function. This change made the levels screen look better in RTL too - I was wondering why the level options were too far left.
Spaces on the left and right would end up on the other side in RTL, which made the "You have rescued a crewmate!" text overlap with the crewmate sprite, and makes the [C[C[C[C[Captain!] dialogs have spaces on the left instead of on the right. So, best thing is to just swap the directions so that they match.
If text is set to be centered, but is so long that it starts running offscreen on both sides, the print function instead makes the text start no further left than the left border of the screen (x=0). This is because text running offscreen at the end only is more readable and looks less sloppy than running offscreen at both sides. For RTL, the opposite applies, so it now also works oppositely for RTL prints, where centered strings will only run offscreen on the left side of the screen.
InfoTeddy
approved these changes
Jan 9, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes:
As mentioned in #829 (comment), we're aiming to get Arabic in for 2.4.0, which is a language that needs some significant technical changes! Luckily those happened fairly quickly and painlessly.
<rtl>
property set<rtl>
property set<rtl>
properties also control things like right-alignment in textboxesfont_ar
(this is an 8x10 font, and it doesn't have Latin characters itself, so I copied 00-FF from Space Station just so we don't duplicate everything, just the important stuff)PR_FONT_IDX(idx)
now has two arguments:PR_FONT_IDX(idx,rtl)
. SincePR_FONT_INTERFACE
(0) sets the RTL printflag based on the current language,PR_FONT_LEVEL
sets it based on the current level, and PR_FONT_8X8 sets it disabled,PR_FONT_IDX
requires you to provide whether RTL mode should be enabled. This was always just a simple filling in based on information we had at hand anyway since we got the font index from either the language metadata or level metadata.PR_RTL_FORCE
exists now (it's just set internally byPR_FONT_IDX
)PR_RTL_XFLIP
lets you mirror the X axis specifically in RTL languages, so the left border is 320 and the right border is 0, and invert the meaning ofPR_LEFT
(0) andPR_RIGHT
. Most of the time this is not necessary, it's just for stuff where a label is followed by a different print, like "Font: " followed by the font name, time trial time displays, etc. This is also used for the menus.setfont(font)
scripting command was added to change<font>
during playtime, in parallel, now that<rtl>
was added, I added the scripting commandsetrtl(on/off)
.Whew, that was everything I think! Time for some screenshots:
A random Arabic sentence:
data:image/s3,"s3://crabby-images/9e083/9e083d1fa216f5139ded6a5699b7dcbb739d3332" alt="Viridian saying أحتاج مثالا فيه جملة طويلة جدا لا تتسع لها النافذة وتفيض من الصندوق."
Lots of ligatures and right-alignment:
data:image/s3,"s3://crabby-images/430e9/430e95cd0d9d66c9165f0cfd30d644522eb6ca04" alt="Viridian saying أنا هنا آمنا. وئام لئام لا كلام لأحزان زالت وراء شراء ناي بناء يا عيال الخير للطير مللت مالي باءت القباء للوباء للا للأ للإ للآ لا لأ لإ لآ."
A right-to-left menu, and user input:
2024-01-03_23-52-07.mp4
Legal Stuff:
By submitting this pull request, I confirm that...
CONTRIBUTORS
file and the "GitHub Friends"section of the credits for all of said releases, but will NOT be compensated
for these changes