Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adjustments to the (Un/)aspirated Voiceless Consonants rules #7

Open
5 tasks
eroux opened this issue Jan 13, 2021 · 2 comments
Open
5 tasks

adjustments to the (Un/)aspirated Voiceless Consonants rules #7

eroux opened this issue Jan 13, 2021 · 2 comments

Comments

@eroux
Copy link

eroux commented Jan 13, 2021

The rules like

SUBSTITUTE ("གཐ(.*)"r) ("གཏ$1"v) TARGET (σ);

etc. seem fine, but there seems to be a few exceptions (especially to this one), for instance here's what I find in the Monlam dictionary:

  • གཐུམ -> འཐུམ (not གཏུམ)
  • གཐོས -> ཐོས (not གཏོས)
  • གཐྲ -> གཏར (not གཏྲ, which would be invalid)
  • གཆེག -> ཙེག (not གཅེད)
  • གཐང -> ཐང (not གཏང) (note that I'm not so sure about this one)

I think this should be double checked but I just want to record it here to start the discussion

@FChrispz
Copy link
Contributor

FChrispz commented Mar 11, 2022

@eroux this one is not very clear to me - we actually want to change གཐོས -> གཏོས and same for the other cases you listed - in fact in the grammar we said:

Aspirated Voiceless Consonants
Background: In Tibetan, Aspirated and Unaspirated Voiceless Consonants can appear only at syllable onset. In old Tibetan aspirated and unaspiratad voiceless initials does not have a phonemic distinction.
Rule: At the beginning of the syllable make the following changes:

SUBSTITUTE ("མཀ(.)"r) ("མཁ$1"v) TARGET (σ);
SUBSTITUTE ("མཅ(.
)"r) ("མཆ$1"v) TARGET (σ);
SUBSTITUTE ("མཏ(.)"r) ("མཐ$1"v) TARGET (σ);
SUBSTITUTE ("མཙ(.
)"r) ("མཚ$1"v) TARGET (σ);
SUBSTITUTE ("འཀ(.)"r) ("འཁ$1"v) TARGET (σ);
SUBSTITUTE ("འཅ(.
)"r) ("འཆ$1"v) TARGET (σ);
SUBSTITUTE ("འཏ(.)"r) ("འཐ$1"v) TARGET (σ);
SUBSTITUTE ("འཔ(.
)"r) ("འཕ$1"v) TARGET (σ);
SUBSTITUTE ("འཙ(.*)"r) ("འཚ$1"v) TARGET (σ);

@eroux
Copy link
Author

eroux commented Mar 15, 2022

what I mean is that in order to lemmatize better, you should first apply the rules indicated above. Otherwise your rules will replace
གཐུམ with གཏུམ, but in that case the equivalent in Classical Tibetan is འཐུམ, not གཏུམ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants