Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept for a loosely defined date field #2691

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

chdorner
Copy link
Member

@chdorner chdorner commented Feb 22, 2023

This could be a potential solution for #743.

Essentially this adds two new classes:

  • LooseDate value object with year (required), month (optional), and day (optional)
  • LooseDateField for storing LooseDate value objects in the database

Things to note:

  • LooseDate is validating that it's a valid date by replacing optionally missing values for month and day with 1, and then validating the date
  • The database representation is a string in the YYYY-MM-DD format, missing values for month and day are represented as 00, so the different permutations are "2023-02-22", "2023-02-00", "2023-00-00"

Things that I still need to try:

  • Rendering of the value, in its current state I only changed the book editing form to work with the new field
  • Queries that reference this field
    • Need to make ORDER BY published_date work with this new field. Could potentially work out of the box with string ordering?
    • Need to double-check that we don't do any fancy less-than, more-than, or in-between type queries. This could get tricky

Things that are outstanding for proper implementation of this:

  • Replacing the existing book model fields with this, including a fairly scary database migration.
    • Unclear if we could change the type of the existing column to a string without needing to create a new column
    • This could potentially be a multi-step migration, add a new temporary column as a string, copy data over, delete the old column, rename the new column
  • How does this affect federation between two BookWyrm instances running different versions?
    • i.e. an older instance only able to store precise dates importing a book from a newer instance with a less precise date

What do you all think about this? Anything I missed that I should check?

The screenshot isn't really proof 😅 but this actually does work 🎉
Screenshot 2023-02-22 at 20 20 27

Alternative solution as proposed by @marcusyoung

Instead of using a string field, we could keep using a date (it's actually a datetime) field, and fill in the missing parts with 1. In addition to this, we have a second field which defines the precision.
For a date like Feb 2023 we would store:

  • published_date = "2023-02-01" - really "2023-02-01 00:00:00Z" because it's a datetime
  • published_date_precision = "month"

I like this approach because it allows us to continue using the datefield in queries in WHERE clauses or ordering.
It would even allow us to decide if a less precise date should be ordered after a more precise one, or the other way around.

For example:

  1. order ascending, less precise dates first:
  • first "Feb 2023", then "13 Feb 2023"
  • first "2023", then "March 2023", then "13 March 2023"
  • we can achieve this by filling in the missing day or month with 1
  1. order ascending, more precise dates firs:
  • first "13 Feb 2023", then "Feb 2023"
  • first "13 March 2023", then "March 2023", then "2023"
  • we can achieve this by filling in the missing day with the last day of the month, and the month with the last month of the year

@marcusyoung
Copy link

marcusyoung commented Feb 22, 2023

You can change type and contents at the same time with USING:

ALTER TABLE table
ALTER COLUMN column TYPE data_type USING expression;

I'm a bit wary of ditching the date type field when in many cases it is a date being entered in the field.

I would look to allow the input of the published date without day or month. When written to the database replace anything missing with 01 and set a flag in another column. Then when displaying the date extract the required elements of the date (e.g. just year) based on this flag. That way you lose none of the benefits of storing as a date (like extracting the parts you want to render with existing SQL functions; use of sort, comparison etc). Just my initial thoughts. May be flawed.

@chdorner
Copy link
Member Author

Yeah I need to understand more how we query those date fields, the main benefits for keeping them as date fields mostly revolve around querying, either with WHERE clauses or for ordering.
Date validity could be handled in the Django field implementation fairly nicely.

@chdorner
Copy link
Member Author

The ideal requirements on ordering could also help arguing one way or another.

For example, should the ascending order be:

  1. First: Feb 2023
  2. Second: 13 Feb 2023

Or the other way around, ordering less precise dates at the end:

  1. First: 13 Feb 2023
  2. Second: Feb 2023

If we think the second case is more intuitive then your proposed solution @marcusyoung would make more sense, with a small twist:
For dates missing the day part we could fill in the last day of the month, or when month is missing the last month and last day of the month. Then we'd have that sort order figured out.

@marcusyoung
Copy link

You'll also need to consider what happens on import too, with the date coming from OpenLibrary etc.

@marcusyoung
Copy link

May also be a problem with instances being on different versions if this field is inconsistent between databases?

@chdorner
Copy link
Member Author

chdorner commented Feb 22, 2023

You'll also need to consider what happens on import too, with the date coming from OpenLibrary etc.

OpenLibrary is even more flexible, either it's just a year (like mentioned here), or if a book has a precise publishing date (this specific audiobook edition) it transmits it as "Mar 01, 2021".

May also be a problem with instances being on different versions if this field is inconsistent between databases?

Yes! That is a point that we'll have to consider. I'll add it above to not forget.

@chdorner
Copy link
Member Author

I added a proof of concept with @marcusyoung's proposed solution which stores an extra field for each date, i.e. first_published_date and first_published_date_precision.

Besides the extra fields in the model, we can get away with just overriding the form's __init__ function and handling everything in there. Depending on which arguments are passed to EditionForm.__init__ we know if it's a form render or a form submit.

On a form render we:

  • check the precision field and override the corresponding date field value with a string (precision month will results in f"{year}-{month}-0", precision year in f"{year}-0-0").
  • The date selector widget can already handle these types of strings since those are the ones that are also sent on form submit

On a form submit we:

  • detect based on the individual date part fields (i.e. published_date_year, published_date_month, published_date_day) what the precision is, set the precision value in the data, default the missing field values to "1"
  • translating those individual fields into a date is handled automatically by the form

The (ugly) implementation is in this commit: 5c9e318

Generally, I like this approach better. We can still use the date fields for ordering, WHERE clauses, etc.
We could even use this same approach for federating these dates out: the date field with its month and day parts defaulting to "1", plus an extra "_precision" field. This would be a simple way for backwards compatibility, older BookWyrm versions would just import the date field and ignore the precision field, and newer BookWyrm versions would also import the precision field and would be able to support less precise dates.

Curious to know what you think of these two approaches @mouse-reeve? Feel free to compare the two commits in this PR.

@dato
Copy link
Contributor

dato commented Oct 16, 2023

We could perhaps avoid the additional AP fields because partial serialization in formats like YYYY and YYYY-MM are backwards-compatible with the current use of dateutil.parser (which parses 2022 and 2022-10 just fine).

(To be clear, I'm only talking about network-transmitted data here.)

Say such code is released in v0.7. Then, upon an incoming date like '2023':


P.S.: While I'm commenting, I do like the idea of a custom PubDateField but—given the effort and care it takes—it would be great if the existing column could be upgraded, rather than introducing an extra one (if I'm reading the code correctly). Anybody mind if I take this code and take a stab at that approach?¹

(¹) As an instance maintainer, I would love if the migration either defaulted, or allowed me to pick, "set precision = year if day = month = 1".

@dato
Copy link
Contributor

dato commented Oct 23, 2023

While I'm commenting, I do like the idea of a custom PubDateField but—given the effort and care it takes—it would be great if the existing column could be upgraded, rather than introducing an extra one (if I'm reading the code correctly). Anybody mind if I take this code and take a stab at that approach?

I submitted #3059 for review, which reuses every possible field and form, in order to keep backwards compatibility.

Turns out Django forms already support serializing partial (though possibly invalid) dates: they do it replacing 0 for the missing parts. I leveraged this in the forms, which makes for few modifications needed there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants