Disambiguation of the glosses by tones and part of the speech finalized #48

vieenrose · 2017-05-19T14:14:08Z

The disambiguation of the glosses by tones and part of the speech is finalized after some adjustments during the test.

…rity for the learning of the tone which in its last modification, necessitated data structures more adapted than those we inherited the source code of NLTK. In the present state, we verify that learning the part of discourse, and its disambiguation has not been altered following the introduction of the beginning of this architectural change.

…because the compound tokens of syllables are sent individually to the label, this had led to poor accuracy.

…form in differential_tone_coding.py and an adjustment for reordering lists.

…al to the code separator '_' , the encoder will detect a error its code checker, for the moment we exclude this character from the encoding process which generates the learning data in order not to annoy the coder.

… can now be coded as a character to be deleted or inserted thanks to the implementation of a split2 function.

…rpus.

… 4, using the zipFile module.

…t as float

…tone learning and prediction based on a subset of caracters (only on diacritic caracters or only on non-diacritic caracters).

… added options. In the previous commit, we learn still on all the caracters even with option --diacritic_only or --non_diacritic_only.

…on 1% of total corpora.

…diacritic_only

vieenrose · 2017-05-30T08:35:25Z

In addition to some bugfix, the current pulling carry out two options allowing to precise two partially tone modelings (the one treating of the subset of all diacritics characters, and another one focusing on all non diacritics characters).

…alue is positive, which specify a syllabification when the value is negative. Otherwise, the segmentation is bypassed.

…it's like a corpora cleaning function)

…on the specified (with --evalsize) portion of corpora

decomposition

… script

…nor filter neither edit operation decomposition

… post-processing more esay

…list(list([token_non_marked, token_marked]))

…trée en list(list([token_non_marked, token_marked]))" This reverts commit e3c523e.

…djustement" This reverts commit 38a9570.

…isplay adjustement"" This reverts commit 7d0299b.

This reverts commit 6d1d5c8.

This reverts commit 5efe5f4.

This reverts commit e3f45b3.

vieenrose · 2017-08-03T12:44:14Z

Minors bugfix

vieenrose added 21 commits May 16, 2017 17:26

Restructuring the code to prepare tone disambiguation

80f57a8

bugfix : The previous training did not take the context into account …

69bbe87

…because the compound tokens of syllables are sent individually to the label, this had led to poor accuracy.

Improved Readability

84f4253

Improved Readability 2

5052773

add a convinient acuracy calculation function

a6c5682

Improvement of export module of the labeling result

b307f4c

adaptation after the code restructuration

5f52600

Disambiguatio (pour tons) in developement

57e41b2

Move the tool functions for learning and disambiguation by the tonal …

ec140ed

…form in differential_tone_coding.py and an adjustment for reordering lists.

minor bugfix and removal of debug lines

091cac4

The character '_' (which is also chosen as the separator in the code)…

26b2899

… can now be coded as a character to be deleted or inserted thanks to the implementation of a split2 function.

Added two model files generated from disambiguated part of Corbama co…

eb36fe8

…rpus.

Corrected the behavior of the archiving of the tone models, which are…

ed9f92c

… 4, using the zipFile module.

bugfix : correction of option -e by specifing the type of its arugmen…

a95b9c2

…t as float

Add two options --diacritic_only and --non_diacritic_only allowing a …

a8cc8d3

…tone learning and prediction based on a subset of caracters (only on diacritic caracters or only on non-diacritic caracters).

Bugfix for a8cc8d3, partiel learning is now corrected for the 2 newly…

475ec86

… added options. In the previous commit, we learn still on all the caracters even with option --diacritic_only or --non_diacritic_only.

Removal of the debug configuration : R = 0.01 allowing learning only …

9f02301

…on 1% of total corpora.

bigfix for code_resort which haven't sorted according to operation mode

7db5646

Adjustment for partial learning cases with --diacritic_only or --non_…

155e602

…diacritic_only

vieenrose added 8 commits June 26, 2017 15:33

Add chunkmode option which specify the segementation width when the v…

fe94029

…alue is positive, which specify a syllabification when the value is negative. Otherwise, the segmentation is bypassed.

Bugfix for fe94029

d66b511

Bugfix for grammar rule file

3614381

Add filtering option for specifing a fine filter on edit operations (…

fe18dc0

…it's like a corpora cleaning function)

Revert the debug setting R = 0.01 to R = 1 for that we apply traning …

37f83cd

…on the specified (with --evalsize) portion of corpora

Add non_coding option for making the original learning and prediction

aa851df

Add option no_decomposition allowing to disable the edit operation

d75d817

decomposition

Revert R = 0.1 to R = 1

742ee41

vieenrose added 29 commits July 1, 2017 00:51

split exp.sh in 4 files

2e3adaa

bugfix

72df7a8

Adjust evalsize in experiment script

3b52f53

rename the script for making experiment about accuracy vs evalsize

e09add1

Add a launch all experiment srcipt

c5a70c1

Add kill all experiment script and a bugfix for launch all experiment…

5d8d5b3

… script

launch_all_exps.sh update

7b62250

accuracy_vs_evalsize.sh bugfix

bf4989f

evalsize = 50, sleep before tail

a91f585

add python in addtion to Python in the tokill list

d84eb70

add experiment script for the case of no filter, of no coding and of …

9ce77a1

…nor filter neither edit operation decomposition

Merge branch 'master' of https://github.com/vieenrose/daba

64b5068

In exported result, a espace in inserted between syllables for making…

31f42ee

… post-processing more esay

add error minung script

55fbecb

remove silence from the confusion matrix calc.

92404e7

add matrix printing in print_cnt function and somme display adjustement

38a9570

first commit

e3f45b3

initialisation

5efe5f4

supprimer les *.pyc et *.zip du git

6d1d5c8

Un lecteur de texte brut transmet le texte qu'il reçoit en entrée en …

e3c523e

…list(list([token_non_marked, token_marked]))

Revert "Un lecteur de texte brut transmet le texte qu'il reçoit en en…

1ee2e83

…trée en list(list([token_non_marked, token_marked]))" This reverts commit e3c523e.

Revert "add matrix printing in print_cnt function and somme display a…

7d0299b

…djustement" This reverts commit 38a9570.

Revert "Revert "add matrix printing in print_cnt function and somme d…

0e0d652

…isplay adjustement"" This reverts commit 7d0299b.

Revert "supprimer les *.pyc et *.zip du git"

68d5a60

This reverts commit 6d1d5c8.

Revert "initialisation"

319e918

This reverts commit 5efe5f4.

Revert "first commit"

9a4d942

This reverts commit e3f45b3.

bugfix for file scanning method

a7ce305

bugfix for disambiguation

56ebca3

Merge branch 'master' of https://github.com/vieenrose/daba

cdc48f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disambiguation of the glosses by tones and part of the speech finalized #48

Disambiguation of the glosses by tones and part of the speech finalized #48

vieenrose commented May 19, 2017

vieenrose commented May 30, 2017 •

edited

Loading

vieenrose commented Aug 3, 2017

Disambiguation of the glosses by tones and part of the speech finalized #48

Are you sure you want to change the base?

Disambiguation of the glosses by tones and part of the speech finalized #48

Conversation

vieenrose commented May 19, 2017

vieenrose commented May 30, 2017 • edited Loading

vieenrose commented Aug 3, 2017

vieenrose commented May 30, 2017 •

edited

Loading