Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cantonese Jyutping Support #25

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Cantonese Jyutping Support #25

wants to merge 11 commits into from

Conversation

TTWNO-zz
Copy link

For implementation in dragonmapper...

@tony
Copy link

tony commented May 27, 2017

Rebase against latest. Can you squash where necessary?

Can you take out the version part so it can be changed when the release is cut?

@tsroten
Copy link
Owner

tsroten commented May 28, 2017

As you can tell, I don't spend a lot of time on my Chinese GitHub projects anymore. Thanks for contributing!

This is a great start. It will need some work before it's ready to merge. I'm happy to work with you on the things that need to be changed.

  1. You have over 2000 Cantonese syllables in this pull request. I believe that's too many.
  2. You're missing some important syllables like jyut from Jyut-ping ;)
  3. This doesn't follow the behavior of other modules in Zhon. You'll probably want to mimic the way that the zhuyin module does it.
    • Use a regular expression pattern of initials and finals, not a list of syllables (you can use a list of syllables for the tests).
    • For constants like tones (use marks instead), please mimic the Python's standard library string module's constants, which are strings, not lists (for one thing lists are mutable).

I'd suggest using a resource like these three charts to create a regular expression pattern like the Zhuyin and Pinyin ones in Zhon.

Here is a short example of jyutping.py using the syllables that end in aa and ai:

characters = 'bpmfdtnlgkhwzcsjaeiouy'

marks = '123456'

syl = syllable = (
    '(?:'
    '(?:(?:[gk]w|[bpmfdtnlzcsgkhwj])?aa)|'
    '(?:(?:[gk]w|ng|[bpmfdtnlzcsgkhwj])?ai)'
    ')[{marks}]'
).format(marks=marks)

@TTWNO-zz
Copy link
Author

Awesome! Thanks for the help!

Will get to it when I can :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants