Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand the types of dates that can be extracted from texts with as_messydate() #52

Open
henriquesposito opened this issue May 11, 2022 · 7 comments
Assignees

Comments

@henriquesposito
Copy link
Collaborator

Although we already do a very good job in expanding and converting dates from text, additional (more complex) types of dates in text (i.e. negative and/or historical dates, approximate dates, date ranges, and sets of dates) as {unstruwwel} does should also be added. The obvious choice would be to rely on the {unstruwwel} package, but they are not on CRAN...

https://github.com/stefanieschneider/unstruwwel

@henriquesposito henriquesposito self-assigned this May 11, 2022
@jhollway
Copy link
Contributor

Yes, let’s try and keep this a pretty independent package, and of course we cannot rely on GitHub-only packages. Perhaps make each of these extensions an additional issue to better keep track of them? Or add them as checkboxes here:

  • add text extraction for historical dates
  • add text extraction for BC/BCE dates
  • add text extraction for approximate dates
  • add text extraction for date ranges
  • add text extraction for sets of dates

@henriquesposito
Copy link
Collaborator Author

  • add date inferences for text extraction (e.g. "signed on the last day of February 2004")

@henriquesposito
Copy link
Collaborator Author

henriquesposito commented May 16, 2022

  • extract multiple dates from text (currently only extracts first one per row)

@jhollway
Copy link
Contributor

I think this is a watershed feature for this package. Do we want to offer this as part of the package when we write up the paper, or is it a non-core addition?

@henriquesposito
Copy link
Collaborator Author

henriquesposito commented May 19, 2022

I tend to agree, and I am not sure this is a core addition (I am not sure this would be getting into the paper). I think, since we are already getting spelled dates in text very well, we can think about adding or not these features at a later stage... For the future, maybe, we should contact the developer for unstruwwel and see if there are any plans to get the package on CRAN before starting to extend these functions.

@henriquesposito
Copy link
Collaborator Author

henriquesposito commented May 20, 2022

  • extract date in roman numerals

@henriquesposito
Copy link
Collaborator Author

  • test for false positives
  • consider other languages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants