Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation to implement NER #10

Open
koushikram3420 opened this issue Jan 31, 2021 · 2 comments
Open

Documentation to implement NER #10

koushikram3420 opened this issue Jan 31, 2021 · 2 comments

Comments

@koushikram3420
Copy link

koushikram3420 commented Jan 31, 2021

Hey,
I tried using IndicBert NER for news article clustering using transformers. While tokenization, some of the tokens are getting split up. I wanted to know if there is any way to avoid it.
Also, when I implemented the same example as you have mentioned in your documentation, I get different results.
brisbane (2)
chanakya (2)
kindly help me on why the tokens are not getting recognized properly. When I tried giving custom inputs in the same format of the tokenizer, tokens are not getting recognized and giving encoding as 1 even with add_special_token.
ss4
It would be helpful if you could share some implementations of the NER.

@Kritz23
Copy link

Kritz23 commented Jul 12, 2021

Can you please share your notebook?
Thanks in advance.

@yashsinglatimes
Copy link

yashsinglatimes commented Feb 3, 2022

Anybody able to create an example of NER for indian language using indic bert. That would be very helpful . @koushikram3420 which model you have usen because I think if you have use indic bert then according to your process its label size should be 768 whereas in yours case label size is 9
Screenshot from 2022-02-03 22-31-55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants