Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambiguation of the glosses by tones and part of the speech finalized #48

Open
wants to merge 64 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
3c95ea8
We restructure the core of the code for the visibility and its mudula…
vieenrose May 16, 2017
80f57a8
Restructuring the code to prepare tone disambiguation
vieenrose May 17, 2017
69bbe87
bugfix : The previous training did not take the context into account …
vieenrose May 18, 2017
84f4253
Improved Readability
vieenrose May 18, 2017
5052773
Improved Readability 2
vieenrose May 18, 2017
a6c5682
add a convinient acuracy calculation function
vieenrose May 18, 2017
b307f4c
Improvement of export module of the labeling result
vieenrose May 18, 2017
5f52600
adaptation after the code restructuration
vieenrose May 18, 2017
57e41b2
Disambiguatio (pour tons) in developement
vieenrose May 18, 2017
ec140ed
Move the tool functions for learning and disambiguation by the tonal …
vieenrose May 19, 2017
091cac4
minor bugfix and removal of debug lines
vieenrose May 19, 2017
5916418
When a character is considered by the tone encoder, and it is identic…
vieenrose May 19, 2017
26b2899
The character '_' (which is also chosen as the separator in the code)…
vieenrose May 19, 2017
eb36fe8
Added two model files generated from disambiguated part of Corbama co…
vieenrose May 21, 2017
ed9f92c
Corrected the behavior of the archiving of the tone models, which are…
vieenrose May 22, 2017
a95b9c2
bugfix : correction of option -e by specifing the type of its arugmen…
vieenrose May 23, 2017
a8cc8d3
Add two options --diacritic_only and --non_diacritic_only allowing a …
vieenrose May 24, 2017
475ec86
Bugfix for a8cc8d30dd05ab18855422c3609a84f8c8061a25, partiel learning…
vieenrose May 24, 2017
9f02301
Removal of the debug configuration : R = 0.01 allowing learning only …
vieenrose May 24, 2017
7db5646
bigfix for code_resort which haven't sorted according to operation mode
vieenrose May 29, 2017
155e602
Adjustment for partial learning cases with --diacritic_only or --non_…
vieenrose May 29, 2017
fe94029
Add chunkmode option which specify the segementation width when the v…
vieenrose Jun 26, 2017
d66b511
Bugfix for fe9402902ac80f565bbc8b5a7f7f2972866e4199
vieenrose Jun 26, 2017
3614381
Bugfix for grammar rule file
vieenrose Jun 27, 2017
fe18dc0
Add filtering option for specifing a fine filter on edit operations (…
vieenrose Jun 27, 2017
37f83cd
Revert the debug setting R = 0.01 to R = 1 for that we apply traning …
vieenrose Jun 27, 2017
aa851df
Add non_coding option for making the original learning and prediction
vieenrose Jun 28, 2017
d75d817
Add option no_decomposition allowing to disable the edit operation
vieenrose Jun 28, 2017
742ee41
Revert R = 0.1 to R = 1
vieenrose Jun 28, 2017
86f721c
Bugfix
vieenrose Jun 28, 2017
d2cb787
Bugfix for no_decomposition
vieenrose Jun 29, 2017
618a99b
Add exp.sh for experiencs
vieenrose Jun 29, 2017
7257ff6
add stdbuf support for exp.sh
vieenrose Jun 29, 2017
b82b4db
Test script update
vieenrose Jun 29, 2017
42e5a08
improve gawk pattern and revert evalsize to 10 (meaning 10 percent)
vieenrose Jun 30, 2017
2e3adaa
split exp.sh in 4 files
vieenrose Jun 30, 2017
72df7a8
bugfix
vieenrose Jun 30, 2017
3b52f53
Adjust evalsize in experiment script
vieenrose Jul 1, 2017
e09add1
rename the script for making experiment about accuracy vs evalsize
vieenrose Jul 1, 2017
c5a70c1
Add a launch all experiment srcipt
vieenrose Jul 1, 2017
5d8d5b3
Add kill all experiment script and a bugfix for launch all experiment…
vieenrose Jul 1, 2017
7b62250
launch_all_exps.sh update
vieenrose Jul 1, 2017
bf4989f
accuracy_vs_evalsize.sh bugfix
vieenrose Jul 1, 2017
a91f585
evalsize = 50, sleep before tail
vieenrose Jul 1, 2017
d84eb70
add python in addtion to Python in the tokill list
vieenrose Jul 1, 2017
9ce77a1
add experiment script for the case of no filter, of no coding and of …
vieenrose Jul 1, 2017
64b5068
Merge branch 'master' of https://github.com/vieenrose/daba
vieenrose Jul 1, 2017
31f42ee
In exported result, a espace in inserted between syllables for making…
vieenrose Jul 2, 2017
55fbecb
add error minung script
vieenrose Jul 2, 2017
92404e7
remove silence from the confusion matrix calc.
vieenrose Jul 2, 2017
38a9570
add matrix printing in print_cnt function and somme display adjustement
vieenrose Jul 3, 2017
e3f45b3
first commit
vieenrose Jul 18, 2017
5efe5f4
initialisation
vieenrose Jul 18, 2017
6d1d5c8
supprimer les *.pyc et *.zip du git
vieenrose Jul 18, 2017
e3c523e
Un lecteur de texte brut transmet le texte qu'il reçoit en entrée en …
vieenrose Jul 19, 2017
1ee2e83
Revert "Un lecteur de texte brut transmet le texte qu'il reçoit en en…
vieenrose Jul 19, 2017
7d0299b
Revert "add matrix printing in print_cnt function and somme display a…
vieenrose Jul 19, 2017
0e0d652
Revert "Revert "add matrix printing in print_cnt function and somme d…
vieenrose Jul 19, 2017
68d5a60
Revert "supprimer les *.pyc et *.zip du git"
vieenrose Jul 19, 2017
319e918
Revert "initialisation"
vieenrose Jul 19, 2017
9a4d942
Revert "first commit"
vieenrose Jul 19, 2017
a7ce305
bugfix for file scanning method
vieenrose Jul 19, 2017
56ebca3
bugfix for disambiguation
vieenrose Jul 21, 2017
cdc48f6
Merge branch 'master' of https://github.com/vieenrose/daba
vieenrose Jul 21, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
In exported result, a espace in inserted between syllables for making…
… post-processing more esay
vieenrose committed Jul 2, 2017
commit 31f42ee32978824fa445854d554a6e15468e76cd
16 changes: 11 additions & 5 deletions differential_tone_coding.py
Original file line number Diff line number Diff line change
@@ -249,11 +249,17 @@ def csv_export(filename, gold_set, test_set, is_tone_mode = False):
test_form = ''
token = ''
for gold_syllabe, test_syllabe in zip(gold_token, test_token) :
token += gold_syllabe[0]
gold_code += gold_syllabe[1]
test_code += test_syllabe[1]
gold_form += enc.differential_decode(gold_syllabe[0], gold_syllabe[1].decode('utf-8'))
test_form += enc.differential_decode(gold_syllabe[0], test_syllabe[1].decode('utf-8'))
token += gold_syllabe[0] + ' '
if gold_syllabe[1] :
gold_code += gold_syllabe[1] + ' '
else :
gold_code += 'NULL' + ' '
if test_syllabe[1] :
test_code += test_syllabe[1] + ' '
else :
test_code += 'NULL' + ' '
gold_form += enc.differential_decode(gold_syllabe[0], gold_syllabe[1].decode('utf-8')) + ' '
test_form += enc.differential_decode(gold_syllabe[0], test_syllabe[1].decode('utf-8')) + ' '
sameCodes = (gold_code == test_code)
sameForms = (gold_form == test_form)
sameCodes = (gold_code == test_code)