Skip to content

Kinda failures #16

Open
Open
@everythingability

Description

@everythingability

Hello, I'm parsing an academic book (with Flair NER) that contains "dates" such as:

  • more than 500 years ago
  • the fourteenth century
  • the Middle Ages
  • the sixth century
  • the year 105 CE
  • 1962
  • 1967
  • 1952
  • the eighteenth century
  • 1887
  • 1993
  • 1925
  • the century
  • today
  • 1952
  • 1979
  • around 5000 years old
  • the thirty-second century
  • a thousand or more years
  • the first century
  • the sixth century BCE
  • 1996
  • 1998
  • 1999
  • 2017
  • 1542
  • 6th century
  • the 2nd century
  • the 5th century BCE
  • the 7th century CE
  • the last decade
  • 1473
  • 1500
  • 1599
  • the 14th century
  • 1894
  • 1989
  • 1990
  • 9 April 1945
  • July
  • March 1944
  • May 1933
  • the first months
  • Today
  • fifteen hundred years ago
  • 1925
  • at least 3000 years
  • the 3rd century BCE in Qing Dynasty
  • 5000 years ago
  • seventy years
  • 2015
  • the 21st century
  • 1814
  • 2000 BCE
  • 1815
  • Christmas Eve, 1851
  • today
  • 1998
  • 2014
  • 2025
  • those days
  • 1983
  • 1991
  • 1996
  • twenty years
  • 1555
  • the twelfth century
  • today
  • 1998
  • 2017
  • twenty years from 1998
  • 1969
  • 1974
  • 1998
  • 2001
  • April 2002
  • every year
  • monthly
  • 1991
  • 2002
  • 2003
  • 2004
  • 2014
  • four years
  • 1269
  • a hundred years
  • seventh century
  • the 9th century
  • the Yuan dynasty
  • 2015
  • today

and ideally want a "close enough for jazz" year (or not) from a parse... but many of the dates like these fail... even ones that seem pretty easy to get like "2000 BCE".

I even tried...

theDate = theDate.replace("BCE", "BC")
theDate = theDate.replace("th century", "00")
theDate = theDate.replace("rd century", "00")
theDate = theDate.replace("st century", "00")
theDate = theDate.replace("nd century", "00")

..to help it out a bit... but it also fails on items like "the fourteenth century" .

Thanks for the work though. I found a python 2.7 module called dataparse that might be worth a peek? Couldn't install it through.

Tom

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions