Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mergeAlignment sometime fails... #2

Open
ksudoh opened this issue Oct 23, 2015 · 0 comments
Open

mergeAlignment sometime fails... #2

ksudoh opened this issue Oct 23, 2015 · 0 comments

Comments

@ksudoh
Copy link

ksudoh commented Oct 23, 2015

Hi,

I've just started to use SyMGIZA++, and found a problem in mergeAlignment.cpp.
It sometimes fails to retrieve a few sentences in the end of dataset but correctly generated in .A3.final.part.
Here's my patch to fix this problem; it seems to work fine in my environments.

Best,

--- src/mergeAlignment.cpp.org  2015-10-23 16:09:57.000000000 +0900
+++ src/mergeAlignment.cpp  2015-10-23 16:10:33.000000000 +0900
@@ -121,7 +121,9 @@
 int last[number];
 for(int i=0;i<number;i++) last[i]=0;

-while(added<all){
+bool found_sent = true;
+while(found_sent){
+   found_sent = false;
    for(int d=0;d<number;d++)
     for(int i=last[d]; i<sent[d].size();i++){
        if (sent[d][i] != "" ) if(strContains(sent[d][i], prefixLine)){
@@ -143,6 +145,7 @@
                sent[d][i] = "";
                sent[d][i+1] = "";
                sent[d][i+2] = "";
+           found_sent = true;
            }
            if(iii > (added+1)) {
                if (i>6) last[d] = i-6;

UPDATED: another problem was found in endSymetrize.cpp, which wrongly removed all words with surface "NULL".

--- src/endSymetrize.cpp.org    2015-10-26 13:10:13.000000000 +0900
+++ src/endSymetrize.cpp    2015-10-23 18:53:47.000000000 +0900
@@ -34,6 +34,7 @@
{
    tInput.replace( uPos, uFindLen, tReplace );
    uPos += uReplaceLen;
+       if (tFind == "NULL") break;
}
return tInput;

@@ -79,11 +80,16 @@
 //int a_size=0;
 //int b_size=0;

+unsigned int state = 0;
+unsigned int numline = 0;
+
 while (inp) {
 getline(inp,s1);
 getline(inpInv,s2);
+if (++state == 3) { state = 0; }

-if(s1.find("NULL")!= string::npos){
+//if(s1.find("NULL")!= string::npos){
+if (state == 0) {
 const char *str1 = s1.c_str();
 //a_size=0;
 a.clear();
@@ -248,6 +254,7 @@
            bsp++;
        }

+       numline++;
        if((a.size()+1) == asp && (b.size()+1) == bsp ){

        out << "1" << endl;
@@ -259,7 +266,7 @@
        for(int i=0; i<a.size();i++) out    << " " << a[i];
        out << endl;
        } else{
-           cout << "Error in sentence "<< endl;
+           cout << "Error in sentence " << numline << " source_length:" << a.size() << " target_length:" << b.size() << endl;
        }
 }
 }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant