Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

</sentence> sentid="" output when assume_input_is_tokenized=on #7

Open
oktaal opened this issue Feb 4, 2021 · 0 comments
Open

</sentence> sentid="" output when assume_input_is_tokenized=on #7

oktaal opened this issue Feb 4, 2021 · 0 comments

Comments

@oktaal
Copy link

oktaal commented Feb 4, 2021

When I modify the Makefile.start_server script

assume_input_is_tokenized=off\

and change assume_input_is_tokenized=off to assume_input_is_tokenized=on the output becomes malformed.

For example:

$ make -f Makefile.start_server 
PROLOGMAXSIZE=1500M /opt/Alpino-git233/bin/Alpino -notk -veryfast user_max=20000\
            server_kind=parse\
            server_port=42424\
            assume_input_is_tokenized=on\
            debug=1\
            -init_dict_p\
            batch_command=alpino_server\
    	2> /alpino_server.log &

$ telnet localhost 42424
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
hallo wereld .
top/top|top/hd|hallo/[0,1]|127.0.0.1
hallo/[0,1]|tag/nucl|wereld/[1,2]|127.0.0.1
/[2,3]|127.0.0.1app|.
<?xml version="1.0" encoding="UTF-8"?>
<alpino_ds version="1.6">
  <parser build="Alpino-x86_64-linux-glibc2.5-git233-sicstus" date="2021-02-04T16:52" cats="1" skips="0" />
  <node begin="0" cat="top" end="3" id="0" rel="top">
    <node begin="0" cat="du" end="3" id="1" rel="--">
      <node begin="0" end="1" frame="tag" his="normal" his_1="normal" id="2" lcat="advp" lemma="hallo" pos="tag" postag="TSW()" pt="tsw" rel="tag" root="hallo" sense="hallo" word="hallo"/>
      <node begin="1" cat="np" end="3" id="3" rel="nucl">
        <node begin="1" end="2" frame="noun(de,count,sg)" gen="de" genus="zijd" getal="ev" graad="basis" his="normal" his_1="normal" id="4" lcat="np" lemma="wereld" naamval="stan" ntype="soort" num="sg" pos="noun" postag="N(soort,ev,basis,zijd,stan)" pt="n" rel="hd" rnum="sg" root="wereld" sense="wereld" word="wereld"/>
"/>pecial="hoofd" word=".m" positie="vrij" postag="TW(hoofd,vrij)" pt="tw" rel="app" root=".ssion" id="5" infl="both" lcat="detp" lemma=".
      </node>
    </node>
  </node>
</sentence> sentid="127.0.0.1">hallo wereld .
</alpino_ds>
Connection closed by foreign host.

Keeping assume_input_is_tokenized to off does give a correctly formatted sentence item: <sentence sentid="127.0.0.1">hallo wereld .</sentence>.

I have to implement a work-around here anyway to support older Alpino-versions, so this isn't an issue for me. But I was wondering if there might be some setting I'm missing here to prevent this from happening? I couldn't figure out where in the Alpino-code this goes wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant