Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Arabic/bidirectional text #1093

Merged
merged 29 commits into from
Jan 9, 2024
Merged

Conversation

Daaaav
Copy link
Contributor

@Daaaav Daaaav commented Jan 3, 2024

Changes:

As mentioned in #829 (comment), we're aiming to get Arabic in for 2.4.0, which is a language that needs some significant technical changes! Luckily those happened fairly quickly and painlessly.

  • This adds a new submodule, SheenBidi, which is a pretty lightweight, compatible (C90 even) and low-dependency library for bidirectional text support.
  • Arabic letter reshaping was written with guidance of this code: https://github.com/TerryCavanagh/hx_arabic_shaper
  • Arabic ligatures are also converted, from a table given by Montassar (one of our Arabic translators)
  • Automatic wordwrapping works properly without further changes (Arabic uses spaces, and we do convert the logical ordering to visual ordering)
  • To provide a fast path when printing text for LTR languages (and less chance of unexpected breakage of existing content), all the bidi and reshaping machinery will only run for a printed string if one of the following is true:
    • Printing interface text while the current language has the <rtl> property set
    • Printing custom level-specific text while the level has the <rtl> property set
    • The string contains characters in Unicode blocks for the Arabic or Hebrew alphabets, or directional control characters (like the famous U+202E).
  • The mentioned <rtl> properties also control things like right-alignment in textboxes
  • An Arabic font (made by Montassar) is added as font_ar (this is an 8x10 font, and it doesn't have Latin characters itself, so I copied 00-FF from Space Station just so we don't duplicate everything, just the important stuff)
  • Changes made to print flags:
    • PR_FONT_IDX(idx) now has two arguments: PR_FONT_IDX(idx,rtl). Since PR_FONT_INTERFACE (0) sets the RTL printflag based on the current language, PR_FONT_LEVEL sets it based on the current level, and PR_FONT_8X8 sets it disabled, PR_FONT_IDX requires you to provide whether RTL mode should be enabled. This was always just a simple filling in based on information we had at hand anyway since we got the font index from either the language metadata or level metadata.
    • PR_RTL_FORCE exists now (it's just set internally by PR_FONT_IDX)
    • PR_RTL_XFLIP lets you mirror the X axis specifically in RTL languages, so the left border is 320 and the right border is 0, and invert the meaning of PR_LEFT (0) and PR_RIGHT. Most of the time this is not necessary, it's just for stuff where a label is followed by a different print, like "Font: " followed by the font name, time trial time displays, etc. This is also used for the menus.
  • Just like the setfont(font) scripting command was added to change <font> during playtime, in parallel, now that <rtl> was added, I added the scripting command setrtl(on/off).

Whew, that was everything I think! Time for some screenshots:

A random Arabic sentence:
Viridian saying أحتاج مثالا فيه جملة طويلة جدا لا تتسع لها النافذة وتفيض من الصندوق.

Lots of ligatures and right-alignment:
Viridian saying أنا هنا آمنا. وئام لئام لا كلام لأحزان زالت وراء شراء ناي بناء يا عيال الخير للطير مللت مالي باءت القباء للوباء للا للأ للإ للآ لا لأ لإ لآ.

A right-to-left menu, and user input:

2024-01-03_23-52-07.mp4

Legal Stuff:

By submitting this pull request, I confirm that...

  • My changes may be used in a future commercial release of VVVVVV
  • I will be credited in a CONTRIBUTORS file and the "GitHub Friends"
    section of the credits for all of said releases, but will NOT be compensated
    for these changes

@InfoTeddy InfoTeddy self-assigned this Jan 3, 2024
@InfoTeddy InfoTeddy added the localization Implies 'required for 2.4' unless otherwise noted label Jan 3, 2024
Copy link
Contributor

@InfoTeddy InfoTeddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just writing down what I reviewed and discussed on Discord:

  • U+FEFF is the Byte Order Mark and shouldn't be considered an Arabic character.
  • Confusing wording in documentation: In "so the left border is 320 is the right border is 0", the second "is" should be "and". This mistake has been copy-pasted in several places.
  • PR_RTL_AXIS should be renamed to PR_RTL_XFLIP.
  • This warning:
    /home/user/Programming/VVVVVV/desktop_version/src/FontBidi.cpp:186:1: warning: missing initializer for member ‘font::ArabicLetter::final’ [-Wmissing-field-initializers]
      186 | };
          | ^
  • Menu options in the list of custom levels are not reversed, except for the last three options, despite the left/right keybindings still being reversed.

@Daaaav
Copy link
Contributor Author

Daaaav commented Jan 5, 2024

Just force-pushed a whole bunch of fixes and improvements - everything mentioned in the comment above and on Discord, as well as some other things:

  • Fixed controller button directions in menus
  • Fixed reshaping and ligatures not always working properly because I forgot to include run offsets
  • Fixed centering and padding in textboxes
  • Mirrored some more interfaces that needed to be (like time trial results and game complete)

Daaaav added 23 commits January 9, 2024 00:11
I'm going to give it a shot to use this for bidi text support, it looks
like it's a pretty lightweight, compatible and low-dependency library
which is definitely a plus. We'll still need to do reshaping ourselves,
but that's the easy part compared to bidi.
Not much to see here, just making sure SheenBidi is compiled with the
game and we can include its headers.
I'm now using SheenBidi to reorder RTL and bidirectional text properly
at text rendering time! For Arabic this is still missing reshaping, but
everything's looking really promising now!

The code changes are really non-invasive. The changes to Font.cpp are
absolutely minimal:

     1305+    if (bidi_should_transform(text))
     1306+    {
     1307+        text = bidi_transform(text);
     1308+    }

There's now a FontBidi.cpp, which implements these two functions,
notably bidi_transform(), which takes a UTF-8 encoded string and
returns another UTF-8 encoded string that has bidi reorderings and
reshapings applied.

In that function, SheenBidi gives us information about where in the
input string runs start and end, and on a basic level, all we need to
do there is to concatenate the parts together in the order that we're
given them, and to reverse the RTL runs (recognizable by odd levels).

As this is a proof-of-concept, bidi_should_transform() still always
returns true, applying the bidi algorithm to all languages and all
strings. I'm thinking of enabling bidi only when the language/font
metadata enables RTL (which could be for the interface or for a custom
level), or outside of that, at least when RTL characters are detected
(such as Arabic or Hebrew Unicode blocks).
This adds the lookup table and an accompanying hashmap that will be
used for reshaping Arabic - it's not yet used though.
This code can probably be polished a bit more, but the hard part is
over!

This part was written with guidance of this code:
https://github.com/TerryCavanagh/hx_arabic_shaper
The Arabic part was made by Montassar Ghanmi, it didn't have its own
Latin part so I just copied 00-FF from Space Station.
Montassar prepared a list of all the ligatures that needed to be
supported, which was a simple A+B->C table, so that one was not too
difficult either!
This now returns true if any of the characters in the text belong to
the Arabic or Hebrew alphabet, or are one of the Unicode directional
formatting characters. This is just so the bidi machinery doesn't have
to run 100% of the time for 100% of the languages. I will also make it
so the Arabic language pack, as well as custom levels, have an RTL
attribute that always enables bidi (and does things like
right-alignment in textboxes and other design-flipping)
They shouldn't be looked up in the font and displayed under any
circumstances.
This doesn't have an effect yet, but it'll do things like
right-alignment in textboxes and other design-flipping.
Again, the RTL property controls whether textboxes will be
right-aligned, and that kind of stuff. It can't be font-bound, since
Space Station supports Hebrew characters and we want to be able to
support, say, a Hebrew translation or Hebrew levels in the future
without having to make a dedicated (or duplicated) font for it.
Therefore it's a property of both the language pack as well as custom
levels - like custom levels already had a <font> tag, they now also
have an <rtl> tag that sets this property.

Right now, we'll have to hardcode it so the menu option for the Arabic
font sets the <rtl> property to 1, and all the other options set it to
0. But it's future-proof in that we can later decide to split the
option for Space Station into an LTR option and an RTL option (so both
"english/..." and "עברית" would select Space Station, but one sets the
RTL property to 0 and the other sets it to 1).
With the <font> tag (which doesn't indicate RTL-ness as explained),
we've had a setfont(font) scripting command. Now we have an <rtl>
tag, so we need a setrtl(on/off) command too to control that.
This will return if the given flags indicate RTL properties
(such as textboxes being right-aligned).
Most of this diff is just moving some existing code around, and
changing inline things to variables that could be changed more easily.
This lets you mirror the X axis specifically in RTL languages, so the
left border is 320 and the right border is 0, and invert the meaning of
PR_LEFT (0) and PR_RIGHT. Most of the time this is not necessary,
it's just for stuff where a label is followed by a different print,
like "Font: " followed by the font name, time trial time displays, etc
Okay, the "Font:" thing needed some local code after all, because both
the interface font as well as the level font are used there. But it's
good enough - all the other places can just use the flag.

Notably, I also used this for the menus, since the existing ones are
kinda LTR-oriented, and it's something that we don't *really* have to
do, but I think it shows we care!
Instead of just up/down, you can also control menus with left/right.
Which is illogical in Arabic... No big deal, I imagined this code
to become much worse than it did. (And action sets is probably gonna
refactor the whole thing anyway)
Stuff like centertext="1" and padtowidth="264" in cutscene translations
looked wrong in RTL mode, both with Arabic and English text. For Arabic
text, I could easily fix the problem by not counting the number of
codepoints (and assuming they all have the same glyph width), but by
instead taking the width of the string as reported for the font, and
dividing it by the glyph width. This leaves English text still looking
weird in RTL mode. But this shouldn't be a problem either: the Arabic
translations will probably be in Arabic (where the problem doesn't
happen), and I can get English text to show up fine by wrapping it in
U+2066 LEFT-TO-RIGHT ISOLATE and U+2069 POP DIRECTIONAL ISOLATE. So it
looks like an inherent quirk of bidi, that translators familiar with
bidi can easily grasp and fix.

This is main-game only functionality, so it shouldn't break existing
custom levels. We should just make sure textboxes in other languages
aren't broken, but from my testing, it's completely fine - in fact, it
should've improved if it was broken.
I forgot to add the PR_RTL_XFLIP flag to these menu options, so they
were always left-aligned, no matter what.

What actually took me a bit to figure out was how to make the level
completion stars work regardless of the contents of the title - the
stars should always be to the left of the title in an LTR language, and
always to the right of the title in an RTL language. Level titles can
contain bidi characters regardless of the level's rtl flag being set,
so I just let bidi handle all the level menu options, with some control
characters to make sure everything always appears in the correct order.
The slider itself was getting mirrored, but not the labels
(Low/Medium/High). This fixes that.
This has a lot of reading-orientation stuff on it like "Key: value",
so easiest is to just flip the whole design of the screen rather than
trying to flip individual strings.
Obvious "Key: value" things here, this one was easy.
If you copy-paste a newline character where it's not interpreted, such
as in a level title, the print function wouldn't treat it any special.
font::print_wrap() would, but that's not used here.

However, now that bidi is involved, the newline is passed straight to
SheenBidi which interprets it as a new line (which would need a new
SBLine to be created, or maybe even a new SBParagraph if there's two).
All while we're still treating it as a single line. This means the text
would just stop being displayed after the first newline. This is now
fixed by treating all newlines as spaces.
Daaaav added 6 commits January 9, 2024 00:11
They're invisible in font::print(), but they were still considered
characters with widths in the width function. This change made the
levels screen look better in RTL too - I was wondering why the level
options were too far left.
Spaces on the left and right would end up on the other side in RTL,
which made the "You have rescued a crewmate!" text overlap with the
crewmate sprite, and makes the [C[C[C[C[Captain!] dialogs have spaces
on the left instead of on the right. So, best thing is to just swap
the directions so that they match.
If text is set to be centered, but is so long that it starts running
offscreen on both sides, the print function instead makes the text
start no further left than the left border of the screen (x=0).
This is because text running offscreen at the end only is more readable
and looks less sloppy than running offscreen at both sides.

For RTL, the opposite applies, so it now also works oppositely for RTL
prints, where centered strings will only run offscreen on the left side
of the screen.
@InfoTeddy InfoTeddy merged commit f846ba5 into TerryCavanagh:master Jan 9, 2024
4 checks passed
@Daaaav Daaaav deleted the bidi branch February 3, 2024 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
localization Implies 'required for 2.4' unless otherwise noted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants