Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup doesn't work #15

Closed
rosner opened this issue Oct 22, 2012 · 6 comments
Closed

Setup doesn't work #15

rosner opened this issue Oct 22, 2012 · 6 comments

Comments

@rosner
Copy link

rosner commented Oct 22, 2012

Hey folks,

I just wanted to try out your tagger, but I can't get it to run. First of I tried following your hacking.txt but no success.

Also the project structure is weird for a java project. So I have some questions about this project:

  • Why are you providing the jargs jar? Did you change something in it so you cannot use the standard version that is accessible through maven?
  • The same goes for the gnu trove jar that you provide. Any changes made to the library?
  • Why are you separating the actual src files into the separate src folder in the root of the project while maintaining the resources in the ark-tweet-nlp folder?
  • Are metaphone-map2.txt and ptb_ordered_metaphone.txt that are contained in the lib directory external resources or are they created by you? If so, why are they in the lib directory?
  • Where is the posBerkeley.jar from? Is it available to the public (e.g. from here)?

Since I want to use/try/evaluate it, I'm very interested in your project. I'm also experienced with maven, java, eclipse so I could help you with restructuring this stuff.

@brendano
Copy link
Owner

Does "mvn package" work for you? Does it succeed in building the final jar file?

If you want to use/try/evaluate the system, what's stopping you? These file/directory structuring things would be nice to have, but are they actually stopping you from getting work done?

We don't know much about the right way to structure java projects, so help will be appreciated.

Why are you providing the jargs jar? Did you change something in it so you cannot use the standard version that is accessible through maven?

That was before my time, I don't know

The same goes for the gnu trove jar that you provide. Any changes made to the library?

I doubt it

Why are you separating the actual src files into the separate src folder in the root of the project while maintaining the resources in the ark-tweet-nlp folder?

It seemed easier. I hate the way maven nests the src folder really deep by default, but figured that for resources we might as well use maven's default.

Are metaphone-map2.txt and ptb_ordered_metaphone.txt that are contained in the lib directory external resources or are they created by you? If so, why are they in the lib directory?

Created by us. They should be in resources/, that would be better.

Where is the posBerkeley.jar from? Is it available to the public (e.g. from here)?

It was sent to us via email, i believe, but that was before my time. (See the licensing file.) I don't like using it for this reason, because it's not directly available online to the public anywhere I know of -- though many parts of it are included in various Berkeley NLP software on that page.

@rosner
Copy link
Author

rosner commented Oct 22, 2012

You're right: the jar is building with mvn packaging. I can use it now since I trained the model. So everything is fine. The reason I started digging around in the project itself was that the help of the tagger says that it uses an internal model. As I read it, it could be either a file or a resource that comes within the jar. But the default model that is hardcoded in the RunTagger class is not included in the jar.

Thanks!

@brendano
Copy link
Owner

yeah, the model can be downloaded from the website. it's the only resource
that's not checked-in.

On Mon, Oct 22, 2012 at 10:44 AM, Norman Rosner [email protected]:

You're right: the jar is building with mvn packaging. I can use it now
since I trained the model. So everything is fine. The reason I started
digging around in the project itself was that the help of the tagger says
that it uses an internal model. As I read it, it could be either a file or
a resource that comes within the jar. But the default model that is
hardcoded in the RunTagger class is not included in the jar.

Thanks!


Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-9666273.

@brendano
Copy link
Owner

Do you have any suggestions how to make the process less painful? I added a note about the model in particular to docs/hacking.txt.

@rosner
Copy link
Author

rosner commented Oct 23, 2012

First I recommend using the standard maven project structure although you don't like the deep nesting. Every one who uses maven is used to the specific structure. It should als simplify the pom.xml.
Second, I believe that the jargs dependency is not used at all in the project so it could be removed. The RunTagger and the Train class parse the args manually and thus this dependency is not needed. Also gnu trove dependency could be fetched from a repository. As it turns out there's only one class (OWLQN) that is using gnu trove's THashSet.
Third I would ignore the build artifacts in the repo itself. Instead I would use maven to upload the arktweetnlp artifact to the repos Download section. Thus it stays out of the repo but is still accessible if people have problems building it.
Fourth the shell scripts to run the tagger or the tokenizer could be removed or edited so they don't confuse if they can't be run successfully. Also I don't understand the java.sh in the scripts directory. I believe that you guys use it for setting up your dev environment like IDE and stuff?

What do you think? I could work on a PR if you need help. Let me know.

@brendano
Copy link
Owner

Thanks for looking into this.

FYI, java.sh is as I described in hacking.txt -- it just makes it easy to run the tagger on the commandline when developing in an IDE, by using the version of the .class files that (e.g.) Eclipse is auto-compiling. This is very helpful for quick development -- this is how we can do things like fix #14 so fast :)

on trove and owlqn -- so it's only a training-time dependency.

I don't understand the proposal to edit the runTagger and twokenize scripts -- are we talking about comments in them, or something?

@brendano brendano closed this as completed May 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants