Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a problem with tag() return type #23

Open
ghost opened this issue Jun 27, 2019 · 11 comments
Open

a problem with tag() return type #23

ghost opened this issue Jun 27, 2019 · 11 comments

Comments

@ghost
Copy link

ghost commented Jun 27, 2019

Hi, you might recognize the code from your "README", I'm sorry to bother if my question is stupid and I thank you for the work you provided for us.

Here is the code I use ;

from treetagger import TreeTagger
tt = TreeTagger(path_to_treetagger='C:\TreeTagger')

tmp = tt.tag('What is the airspeed of an unladen swallow?')
print(tmp)

with or without the tmp and the print, I get the same return which is "[['Usage: tag-english file {file}']]".

I should get something different, in your example you get a proper answer. I tried some tests and added some prints in the tag function they all go right. I don't understand the problem and I would be grateful if you could help me.

The tt.get_installed_lang() method works just fine and so does the Tree Tagger on his own when I call it with a .txt file.

Many thanks again for your work.

PS: I'm sorry for the many mistakes you will find in this text, english is not my mother tongue.

@ghost
Copy link
Author

ghost commented Jun 28, 2019

And, if this helps even a bit, here is the execution from the cmd (I'm working on Windows) with TreeTagger and TEST.txt containing "What is the airspeed of an unladen swallow?".

C:\TreeTagger>tag-english TEST.txt
reading parameters ...
tagging ...
What WP what
is VBZ be
the DT the
airspeed NN airspeed
of IN of
an DT an
unladen JJ unladen
swallow NN swallow
? SENT ?
finished.

@miotto
Copy link
Owner

miotto commented Jul 1, 2019

Hi Izgeg,

please have a look at the following code fragment.

from treetagger import TreeTagger
tt = TreeTagger(path_to_treetagger='/path/to/your/TreeTagger/', language='german')
tt.tag('Das Haus hat einen großen hübschen Garten.')

You can specify a second parameter for the language when instantiating the TreeTagger class. There you can use a return value of the function get_installed_lang(). In the code fragment e.g. for a german sentence.

Does this answer the question?

Cheers

@ghost
Copy link
Author

ghost commented Jul 2, 2019

Hi again,

I'm sorry if I wasn't clear on what I meant.

The problem is pretty simple : I do not get the expected return.

from treetagger import TreeTagger
tt = TreeTagger(path_to_treetagger='C:\TreeTagger')
print(tt.tag("What is the airspeed of an unladen swallow?"))

For this code, that you gave in the README, you get the following return :

[['What', 'WP', 'what'],
['is', 'VBZ', 'be'],
['the', 'DT', 'the'],
['airspeed', 'NN', 'airspeed'],
['of', 'IN', 'of'],
['an', 'DT', 'an'],
['unladen', 'JJ', '<unknown>'],
['swallow', 'NN', 'swallow'],
['?', 'SENT', '?']]

For this same code that I tested on my computer, I get :

[['Usage: tag-english file {file}']]

But, the TreeTagger works properly on my computer from the cmd I get what I should get ie, the proper tokenization of the sentence.

C:\TreeTagger>tag-english TEST.txt
reading parameters ...
tagging ...
What WP what
is VBZ be
the DT the
airspeed NN airspeed
of IN of
an DT an
unladen JJ unladen
swallow NN swallow
? SENT ?
finished.

So, I believe there might be a problem in the treetagger.py file. If not, I would like to get some of your help for using properly your files.

Many thanks again and sorry for my bad english if it's not clear.

@miotto
Copy link
Owner

miotto commented Jul 2, 2019

Please try the treetagger.py file from the branch windows_test 704f7e9 . I changed the call of the treetagger program.
I don't have Windows, so I can't test it under Windows.

@ghost
Copy link
Author

ghost commented Jul 4, 2019

NLTK was unable to find the TreeTagger bin!
Traceback (most recent call last):
  File ".\test2.py", line 4, in <module>
    print(str(tt.tag('What is the airspeed of an unladen swallow?')))
  File "C:\Path\To\\treetagger.py", line 160, in tag
    p = Popen([self._treetagger_bin],
AttributeError: 'TreeTagger' object has no attribute '_treetagger_bin'

Hi, this is the error I get when I use the branch. I tried some modifications aiming to give you something working under Windows but couldn't make it work.

@miotto
Copy link
Owner

miotto commented Jul 20, 2019

I can't help you because I don't have a Windows computer. You could try it under Linux. Installation instructions for Linux in e.g. VirtualBox under Windows can be found on the Internet.

@simog-dev
Copy link

Same problem for me, also Windows user. Any solution?

@miotto
Copy link
Owner

miotto commented Jun 22, 2021

Have you been able to import the TreeTagger program into Python as follows?
from treetagger import TreeTagger

Have you been able to create a new instance?
tt = TreeTagger(path_to_treetagger='/path/to/treet-tagger')

If so, what is the output of the following command, does the path point to the TreeTagger executable?
tt.get_treetagger_path()

@simog-dev
Copy link

simog-dev commented Jun 23, 2021

Everything seems to work fine.

if i print the result of
tt.get_treetagger_path()
i get
Environment variable 'TREETAGGER_HOME' is C:/TreeTagger/ Path to TreeTagger is C:/TreeTagger/ None

but when i print the result of
print(tt.tag('What is the airspeed of an unladen swallow?'))
i get
[['Usage: tag-english file {file}']]

the full code is the following

from treetagger import TreeTagger
tt = TreeTagger(path_to_treetagger= 'C:/TreeTagger/',language='english')
#print(tt.get_installed_lang())
print(tt.get_treetagger_path())
print(tt.tag('What is the airspeed of an unladen swallow?'))

Using command line everything is working!

@simog-dev
Copy link

Watching at the "tag" function in treetagger.py, seems that the problem is raised by the line
(stdout, stderr) = p.communicate(str(_input).encode('utf-8'))
There "stdout" variable get the value [['Usage: tag-english file {file}']] as if the string passed is not a valid argument.

@miotto
Copy link
Owner

miotto commented Jun 25, 2021

Apparently the TreeTagger programme must now be executed differently. The code is changed, please test it.

You can also run the Python doctest. To do this, set the environment variable in the Windows command line
SET TREETAGGER_HOME=C:\TreeTagger
and then execute the following
python treetagger.py -v

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants