Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is_date implementation seems out of date #462

Open
caramdache opened this issue Sep 3, 2024 · 3 comments
Open

is_date implementation seems out of date #462

caramdache opened this issue Sep 3, 2024 · 3 comments

Comments

@caramdache
Copy link

I've been having issues with certain cell values being recognised as dates because of the cell format was set as a custom date format, but the cell content wasn't really a date and was not supposed to be consummed as a date.

For example, the string '1-23, stored as the sring 133, was being considered as the date 23/09/2024, with created a number of issues for downstream processing, whereas it should have been considered as the range [1, 23], expressed as a string.

While investigating this issue, I've come across this nice writeup, which indicates that this is what used to happen in "compatibility" mode, during the transition from .xlx to .xlsx, but that in modern, newly created .xlsx files, the behaviour should be different. And that the cell should only be considered a date cell, if the cell is of type date ("d"), irrespectively of whether the format is a date format.

http://www.ericwhite.com/blog/dates-in-strict-spreadsheetml-files/

By looking at the current implementation of the is_date method, it looks like it is implementing the "compatibility" mode and not the "strict", modern mode. It would be great if the code could be updated to support the "strict" mode.

@caramdache
Copy link
Author

Here is my suggested change:

    def is_date?
      return true if datatype == RubyXL::DataType::DATE # From Office 2010

      return false unless # Only fully numeric values can be dates
        case raw_value
        when Numeric then true
        when String  then raw_value =~ NUMBER_REGEXP
        else false
        end

      self.number_format&.is_date_format?
    end

@weshatheleopard
Copy link
Owner

Can you please provide an example file which doesn't work the way it should? It looks like the issue here is more complex than you think, and I need an actual file to test it on.

@caramdache
Copy link
Author

Here are 2 simple test files. I discovered on this occasion that, to set strict mode, you have to go to File/Options/Save. This is different from Word, which displays a dialog box on save. So it may be that most .xlsx files are actually saved in compatibility mode.

test-strict.xlsx
test_compatibility.xlsx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants