Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambiguation of the glosses by tones and part of the speech finalized #48

Open
wants to merge 64 commits into
base: master
Choose a base branch
from

Conversation

vieenrose
Copy link
Contributor

The disambiguation of the glosses by tones and part of the speech is finalized after some adjustments during the test.

vieenrose added 21 commits May 16, 2017 17:26
…rity for the learning of the tone which in its last modification, necessitated data structures more adapted than those we inherited the source code of NLTK. In the present state, we verify that learning the part of discourse, and its disambiguation has not been altered following the introduction of the beginning of this architectural change.
…because the compound tokens of syllables are sent individually to the label, this had led to poor accuracy.
…form in differential_tone_coding.py and an adjustment for reordering lists.
…al to the code separator '_' , the encoder will detect a error its code checker, for the moment we exclude this character from the encoding process which generates the learning data in order not to annoy the coder.
… can now be coded as a character to be deleted or inserted thanks to the implementation of a split2 function.
…tone learning and prediction based on a subset of caracters (only on diacritic caracters or only on non-diacritic caracters).
… added options. In the previous commit, we learn still on all the caracters even with option --diacritic_only or --non_diacritic_only.
@vieenrose
Copy link
Contributor Author

vieenrose commented May 30, 2017

In addition to some bugfix, the current pulling carry out two options allowing to precise two partially tone modelings (the one treating of the subset of all diacritics characters, and another one focusing on all non diacritics characters).

vieenrose added 29 commits July 1, 2017 00:51
…nor filter neither edit operation decomposition
…trée en list(list([token_non_marked, token_marked]))"

This reverts commit e3c523e.
This reverts commit 5efe5f4.
This reverts commit e3f45b3.
@vieenrose
Copy link
Contributor Author

Minors bugfix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant