Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special considerations for languages with a Right to Left writing system? #153

Open
DavidHaslam opened this issue Nov 23, 2024 · 1 comment

Comments

@DavidHaslam
Copy link

DavidHaslam commented Nov 23, 2024

For a Bible translation in a language that uses a Right to Left writing system (such as Arabic, Urdu or Farsi) are there any special considerations for whether the USFM parameters should sometimes require the use of any of these special Unicode codepoints:

  • U+200E LEFT-TO-RIGHT MARK [LRM]
  • U+200F RIGHT-TO-LEFT MARK [RLM]
  • U+202A LEFT-TO-RIGHT EMBEDDING [LRE]
  • U+202B RIGHT-TO-LEFT EMBEDDING [RLE]
  • U+202C POP DIRECTIONAL FORMATTING [PDF]
  • U+202D LEFT-TO-RIGHT OVERRIDE [LRO]
  • U+202E RIGHT-TO-LEFT OVERRIDE [RLO]

I was thinking in particular about target references within the scope of cross-reference markers, but there may be several other potential usage cases, such as for the verse marker with a verse range, as in \v 11-12

How does Paratext display a project that uses a Right to Left writing system?

  • Is the whole work displayed with a Right to Left screen layout?
  • If so, how does this affect all the USFM markers which are defined using letters from the Latin alphabet ?
@DavidHaslam
Copy link
Author

DavidHaslam commented Nov 23, 2024

This issue was prompted by an ongoing discussion in the sword-devel mailing list of the CrossWire Bible Society. Of particular concern is how this affects the scripted conversion from USFM to OSIS XML as a step towards making a SWORD module.

The specific project is a Bible translation in Urdu that is maintained by the translator in USFM format though not using Paratext.

A particular example that gave rise to conversion errors was for a combined verse range of two consecutive verses in Genesis 8, in which the RLM is used:

Here's what his latest USFM file contains:
\v 11‏-12 پہلی شاخ کا نام فیسون ہے۔ وہ ملکِ حویلہ کو گھیرے ہوئے بہتی ہے جہاں خالص سونا، گُوگل کا گوند اور عقیقِ احمر\f + \fr 2:11‏-12 \fk عقیقِ احمر: \ft carnelian \f* پائے جاتے ہیں۔

cf. The previous version had this:
\v 11-12 پہلی شاخ کا نام فیسون ہے۔ وہ ملکِ حویلہ کو گھیرے ہوئے بہتی ہے جہاں خالص سونا، گُوگل کا گوند اور عقیقِ احمر\f + \fr ‏2‏:11‏-12 \fk عقیقِ احمر: \ft carnelian \f* پائے جاتے ہیں۔

Which of these two is right ?
Or are there USFM errors in both of them?

NB. GitHub doesn't show the U+200F RIGHT-TO-LEFT MARK [RLM] codepoints.
You may need to paste the code snippets into an editor that can display them.
Windows users might make good use of BabelPad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant