Open
Description
Hello, I'm parsing an academic book (with Flair NER) that contains "dates" such as:
- more than 500 years ago
- the fourteenth century
- the Middle Ages
- the sixth century
- the year 105 CE
- 1962
- 1967
- 1952
- the eighteenth century
- 1887
- 1993
- 1925
- the century
- today
- 1952
- 1979
- around 5000 years old
- the thirty-second century
- a thousand or more years
- the first century
- the sixth century BCE
- 1996
- 1998
- 1999
- 2017
- 1542
- 6th century
- the 2nd century
- the 5th century BCE
- the 7th century CE
- the last decade
- 1473
- 1500
- 1599
- the 14th century
- 1894
- 1989
- 1990
- 9 April 1945
- July
- March 1944
- May 1933
- the first months
- Today
- fifteen hundred years ago
- 1925
- at least 3000 years
- the 3rd century BCE in Qing Dynasty
- 5000 years ago
- seventy years
- 2015
- the 21st century
- 1814
- 2000 BCE
- 1815
- Christmas Eve, 1851
- today
- 1998
- 2014
- 2025
- those days
- 1983
- 1991
- 1996
- twenty years
- 1555
- the twelfth century
- today
- 1998
- 2017
- twenty years from 1998
- 1969
- 1974
- 1998
- 2001
- April 2002
- every year
- monthly
- 1991
- 2002
- 2003
- 2004
- 2014
- four years
- 1269
- a hundred years
- seventh century
- the 9th century
- the Yuan dynasty
- 2015
- today
and ideally want a "close enough for jazz" year (or not) from a parse... but many of the dates like these fail... even ones that seem pretty easy to get like "2000 BCE".
I even tried...
theDate = theDate.replace("BCE", "BC")
theDate = theDate.replace("th century", "00")
theDate = theDate.replace("rd century", "00")
theDate = theDate.replace("st century", "00")
theDate = theDate.replace("nd century", "00")
..to help it out a bit... but it also fails on items like "the fourteenth century" .
Thanks for the work though. I found a python 2.7 module called dataparse that might be worth a peek? Couldn't install it through.
Tom
Metadata
Metadata
Assignees
Labels
No labels