-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error training for instances with only numbers. #46
Comments
Hi @romualdoalan , could you provide a script or a notebook to reproduce this error please? |
Hello @fabiocapsouza , I cloned again the clean repo in my machine to detect this errors, and the things changed. When i just train with my all data, i had problems with this type of instances:
The loggin error that I've receive is:
I had commented on this assert test and walla! it works, but it broke with the first one mentioned in this issue:
and i get this error:
I coudnt do a nootebook to you but I attached the files ("dataset-traina.json" and "classes-total.txt") runnning by (below) command in the same repository.
When i remove the second exemple, the training works. 🥵 Any idea what it could be? I really tried to change things but in the default repo I don't know why this is happening I appreciate your time. Thank you! |
Hi @romualdoalan, this entity
probably fails because this implementation of NER is not able to handle entities that are shorter than 1 "word". The reason is that the tokenization process first tokenizes the text into "words" by splitting at whitespaces and punctuations, and then it applies WordPiece tokenization to each word. In this example, we have only 1 word For the second problem with the longer number, it could be an edge case due to the entity comprising the whole |
I really had to continue knowing these difficulties, I will be looking for solutions and then attach them here, something was broken but I managed to continue without these samples. Thank you very much for your time! Thanks |
I found an error in the code that is related to an output length issue in the get_example_output function in the postprocessing.py file. The specific error is an AssertionError that occurs when the code tries to verify whether the length of the output (complete_output) matches the length of the document tokens for an example in which I only have numbers.
Just with instances like that the assertion failed.
Maybe you can give me some insight. Thank you.
The error:
@fabiocapsouza @rodrigonogueira4
The text was updated successfully, but these errors were encountered: