Skip to content
This repository has been archived by the owner on May 22, 2019. It is now read-only.

Request: Training #2

Open
Salakar opened this issue Jan 10, 2016 · 3 comments
Open

Request: Training #2

Salakar opened this issue Jan 10, 2016 · 3 comments

Comments

@Salakar
Copy link

Salakar commented Jan 10, 2016

Hello!

Have you done any work on training, adding entities and such?

I can help, just need the base structure of it there as my cpp is a little poor hah.

Can do the stemmers and such. Was also contemplating on how the training instances would look, was thinking it'd be possible to do entity tagging within the string, something like:

const instance = 'This library by {bhelx}=PERSON is cool';

Or some other format of syntax sugar that'll find the instances in the string and add them as entities automatically, rather than manually providing the stemmed entity word positions one by one. Though having options for both would be good too.

Thoughts?

@bhelx
Copy link
Owner

bhelx commented Jan 10, 2016

Hey @Salakar

I haven't investigated doing custom training but it was in my TODOs. I assume we'd want to follow the way the underlying C library does it. I think it makes sense to just pass in the token locations the way the C API does it. Here is a python example:

https://github.com/mit-nlp/MITIE/blob/master/examples/python/train_ner.py

Copying that python API would probably be the most straightforward way. We probably wouldn't want to alter the source text because the trainer needs to know the token locations but I'd need to know more about how the parser would work.

@RahulPol
Copy link

Is this done?

@bhelx
Copy link
Owner

bhelx commented May 18, 2017

@RahulPol I'm not working on it. I'm not sure about @Salakar.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants