Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kinda failures #16

Open
everythingability opened this issue Sep 29, 2021 · 3 comments
Open

Kinda failures #16

everythingability opened this issue Sep 29, 2021 · 3 comments

Comments

@everythingability
Copy link

Hello, I'm parsing an academic book (with Flair NER) that contains "dates" such as:

  • more than 500 years ago
  • the fourteenth century
  • the Middle Ages
  • the sixth century
  • the year 105 CE
  • 1962
  • 1967
  • 1952
  • the eighteenth century
  • 1887
  • 1993
  • 1925
  • the century
  • today
  • 1952
  • 1979
  • around 5000 years old
  • the thirty-second century
  • a thousand or more years
  • the first century
  • the sixth century BCE
  • 1996
  • 1998
  • 1999
  • 2017
  • 1542
  • 6th century
  • the 2nd century
  • the 5th century BCE
  • the 7th century CE
  • the last decade
  • 1473
  • 1500
  • 1599
  • the 14th century
  • 1894
  • 1989
  • 1990
  • 9 April 1945
  • July
  • March 1944
  • May 1933
  • the first months
  • Today
  • fifteen hundred years ago
  • 1925
  • at least 3000 years
  • the 3rd century BCE in Qing Dynasty
  • 5000 years ago
  • seventy years
  • 2015
  • the 21st century
  • 1814
  • 2000 BCE
  • 1815
  • Christmas Eve, 1851
  • today
  • 1998
  • 2014
  • 2025
  • those days
  • 1983
  • 1991
  • 1996
  • twenty years
  • 1555
  • the twelfth century
  • today
  • 1998
  • 2017
  • twenty years from 1998
  • 1969
  • 1974
  • 1998
  • 2001
  • April 2002
  • every year
  • monthly
  • 1991
  • 2002
  • 2003
  • 2004
  • 2014
  • four years
  • 1269
  • a hundred years
  • seventh century
  • the 9th century
  • the Yuan dynasty
  • 2015
  • today

and ideally want a "close enough for jazz" year (or not) from a parse... but many of the dates like these fail... even ones that seem pretty easy to get like "2000 BCE".

I even tried...

theDate = theDate.replace("BCE", "BC")
theDate = theDate.replace("th century", "00")
theDate = theDate.replace("rd century", "00")
theDate = theDate.replace("st century", "00")
theDate = theDate.replace("nd century", "00")

..to help it out a bit... but it also fails on items like "the fourteenth century" .

Thanks for the work though. I found a python 2.7 module called dataparse that might be worth a peek? Couldn't install it through.

Tom

@rufuspollock
Copy link
Member

@everythingability i am surprised that 2000 BCE does not work - could you confirm that.

On other items I don't flexidate supports a lot of those yet. Not hard to add them so if you are interested in a PR please go for it ...

@everythingability
Copy link
Author

@rufuspollock Yeah, not sure how that happened but it did/may have :-) I can't reproduce it in the Terminal, all seems OK.

Not sure how to, or I can contribute, I'm no great coder and even worse with git :-)

Attached is a whole heap of "challenging dates", many of which like "2 days after" can't be parsed without a contextual date, or "two thousand years" which definitely shouldn't be <2000 AD>.

But anyway, I've attached it hoping it may help you or someone.

Just a thought, but years ago I used a chatbot tool called Alicebot that read AIML files, which were essentially just long lists of search_and_replace items, that aimed to get the input down to something more managable... or parseable. I think that that approach might work well (But don't know).

Keep up the good work.

@everythingability
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants