version 1.42 versus 1.40 #1181
Replies: 10 comments
-
Would you give us a language and sample text to run?
…On Wed, Jan 18, 2023, 6:44 AM robertobartolini ***@***.***> wrote:
Hi,
I'm using Stanza NLP in a Python project. Whan I upgraded from 1.40 to
1.42 version I reliazed that some tokens consist of multiple words and this
causes me problems when I transform the output into XML. This is an example
("Donno Esposito"):
225 , , PUNCT FF _ 221 punct _ start_char=1209|end_char=1210 -
226 Donno Esposito Donno Esposito PROPN SP _ 1 _
start_char=1221|end_char=1235 B-PER
227 Giuseppe Giuseppe PROPN SP _ 226 flat:name _
start_char=1236|end_char=1244 E-PER
228 , , PUNCT FF _ 229 punct _ start_char=1244|end_char=1245 -
Is the behavior correct?
and if yes, can this behavior be disabled from Python?
Best ,
Roberto.
—
Reply to this email directly, view it on GitHub
<#1181>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2AYWIWIAPI7VN5E2E4JNTWS76V5ANCNFSM6AAAAAAT7FOIXY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi, |
Beta Was this translation helpful? Give feedback.
-
I was not able to trigger this with the current dev branch of Stanza. I did the following:
and there were no results. If I search for
So, hopefully the models on the dev branch are a bit better, unless I was not replicating the issue correctly |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
when I used stanza 1.40 the behavior was different and "donno esposito" was splitted in 2 tokens.... |
Beta Was this translation helpful? Give feedback.
-
How are you creating the "nlp" object?
…On Fri, Jan 20, 2023, 4:41 AM robertobartolini ***@***.***> wrote:
when I used stanza 1.40 the behavior was different and "donno esposito"
was splitted in 2 tokens....
—
Reply to this email directly, view it on GitHub
<#1181 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2AYWLGQCYKIRMC4CS4SW3WTKBW3ANCNFSM6AAAAAAT7FOIXY>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
nlp = stanza.Pipeline('it', download_method=DownloadMethod.REUSE_RESOURCES) I also tested by deleting the flag "download_method" |
Beta Was this translation helpful? Give feedback.
-
What if you install the dev branch instead?
…On Fri, Jan 20, 2023, 8:01 AM robertobartolini ***@***.***> wrote:
nlp = stanza.Pipeline('it', download_method=DownloadMethod.REUSE_RESOURCES)
I also tested by deleting the flag "download_method"
—
Reply to this email directly, view it on GitHub
<#1181 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2AYWIUQ3ZD35P6T7H3OE3WTKZG3ANCNFSM6AAAAAAT7FOIXY>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I'll try as soon as I can and let you know.... |
Beta Was this translation helpful? Give feedback.
-
ok, |
Beta Was this translation helpful? Give feedback.
-
Hi,
I'm using Stanza NLP in a Python project. Whan I upgraded from 1.40 to 1.42 version I reliazed that some tokens consist of multiple words and this causes me problems when I transform the output into XML. This is an example ("Donno Esposito"):
225 , , PUNCT FF _ 221 punct _ start_char=1209|end_char=1210 -
226 Donno Esposito Donno Esposito PROPN SP _ 1 _ start_char=1221|end_char=1235 B-PER
227 Giuseppe Giuseppe PROPN SP _ 226 flat:name _ start_char=1236|end_char=1244 E-PER
228 , , PUNCT FF _ 229 punct _ start_char=1244|end_char=1245 -
Is the behavior correct?
and if yes, can this behavior be disabled from Python?
Best ,
Roberto.
Beta Was this translation helpful? Give feedback.
All reactions