Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different numbers not splitting on space (REGEX bug) #19

Open
pedronogs opened this issue Jan 25, 2021 · 3 comments
Open

Different numbers not splitting on space (REGEX bug) #19

pedronogs opened this issue Jan 25, 2021 · 3 comments

Comments

@pedronogs
Copy link

pedronogs commented Jan 25, 2021

Issue

I'm using UDT application to annotate documents for NER. I found out that if a number is close to another one, they get annotated together (which completely blocks me from annotating two different types of numbers). I found that this originates from react-nlp-annotate and I think this could be some sort of intended behavior, but maybe there is some kind of fix.

I tried to fix this behavior as shown below.

Example

Before

split_to_fix

After

split_fixed

I opened this Pull Request (first one in my life) to show how I fixed this, but I'm not that familiar with this code, so I gladly accept any suggestions !!!

@seveibar
Copy link
Collaborator

seveibar commented Feb 4, 2021

you'll need to use the custom regex feature

@seveibar
Copy link
Collaborator

seveibar commented Feb 4, 2021

Screenshot_20210203-212838.png

See the seperatorRegex prop?

@seveibar
Copy link
Collaborator

seveibar commented Feb 4, 2021

Oh you're using it through the UDT, in that case you'll want to look at the UDT Format's wordSplitRegex https://github.com/UniversalDataTool/udt-format/blob/master/interfaces/text_entity_recognition.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants