Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

looks great, maybe a python version is more competitive and atractive #89

Closed
SeekPoint opened this issue Oct 25, 2018 · 6 comments
Closed
Labels

Comments

@SeekPoint
Copy link

No description provided.

@lfoppiano
Copy link
Collaborator

Dear @lovejasmine,
I personally don't think the language makes the tool more or less competitive, however could you tell us what are your needs? Why would you need a python version?

@SeekPoint
Copy link
Author

I believe that python are more supportive on many ML/DL package like tensorflow,torch,sciketlearn etc.
I think the neural network approach on NER and entity link will emerge.
I want be a contributor on this project if it is python base.

@lfoppiano
Copy link
Collaborator

All good reasons, however can't deep learning models can be developed in Python and used in Java?

@SeekPoint
Copy link
Author

SeekPoint commented Oct 25, 2018

sure, it works on tensorflow which build and train model with python and decode with java , not sure torch works or not.

@kermitt2
Copy link
Owner

kermitt2 commented Oct 25, 2018

Hello!

@lovejasmine you have seen the repo DeLFT in python, so the fact that this specific entity-fishing project is written in java is not an accident. Java is better for manipulating large data set (see hadoop or spark) and this is the purpose of entity-fishing which contains billion of objects in its knowledge base, python pdf parsers are 20-50 times slower, etc...

The ML part might rely on some java library now because it is quite basic, but you can see in DeLFT that I've built in python largely superior DL NER models (state of the art actually), with some constraints related to size, embeddings, etc. with the idea to call these models saved in TF format in Java (although using a service in a docker would be also a solution I think). The way the embeddings are managed in DeLFT and in entity-fishing is exactly the same for this purpose (because the inputs have to be similar) and the fact that the DL model are very small is also motivated by that objective.

If you are interested to contribute to entity-fishing, it's really great, and the best would be actually to contribute to DeLFT on the DL related parts. In DeLFT, new NER models have been created, and I've started to work on an implementation of https://github.com/openai/deeptype with DeLFT - just for the final biLSTM-CRF labelling, after the type system has been generated.

Finally the plan is also to create a python wrapper/client so that entity-fishing can be used transparently in python similarly as NLP library like Spacy (as you know, ML modules are never natively in python).

@SeekPoint
Copy link
Author

Great @kermitt2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants