-
Notifications
You must be signed in to change notification settings - Fork 803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csv import date: Add dateformat "Locale" to pick current locale #2011
csv import date: Add dateformat "Locale" to pick current locale #2011
Conversation
I think this approach will actually get you closest to what you want in a universal way, provided the locale is set properly. While icu in Australian locale doesn't do what you want it to do, you could set LC_TIME=en_US for gnucash en_US does parse it properly. I don't know if we have to something extra in the code to have icu pick up this environment variable or whether it understands its own set of variables (I know for example postgres allows for icu specific parameters, but I don't know whether this is icu or postgres specific). The added advantage would be that each user could override LC_TIME as they see fit. So far only few requests for a date format outside of what we offer have been made. So a workaround that requires setting an environment file may be sufficient so far. As to your remark that icu is not properly parsing "Sep" in the Australian locale, it looks like this was an intentional change. Apparently it's not considered as set in stone but it will need someone (or a few people) to offer enough "evidence" to warrant the change back from "Sept" to "Sep". Likewise for "June/Jun" and "July/Jul". |
Ok. I've just tried with all ICU english locales; all "dmy" outputs and expects "Sept" and all "mdy" outputs and expects "Sep" |
I have reworked your #2010 experiment a little to test for locales that can handle your Australian dates. On my system three remained after testing a date in each month:
So what happens if you try en_AE naq or naq_NA? You may need to test both the short and the medium format. |
47e10e5
to
9afa05d
Compare
Looks good however in my setup the en_AE doesn't exist. I don't know why. How about combining this ICU approach with #2015 ? |
e7671b9
to
ead1fe8
Compare
I think this is ready. Maybe "Locale" should be the first choice to match the Currency format (instead of the current y-m-d, or this branch's boost's UK parser. The icu formatter and calendar objects are generated only once. Here's small issue with ICU locale: the csv save settings will store "Locale" but won't specify which locale. |
ead1fe8
to
aef7dfd
Compare
You could argue that the exact locale is not set in the import preview itself and so we shouldn't save it. It's probably not the most user friendly view, but a pragmatic one in this case. Saving the actual current locale even though it's not explicitly set can equally cause unexpected behaviour. What is a problem IMO is that your added dropdown options alter the meanings of currently saved presets. That should be avoided. To solve this, you could add the new options below the ones that where already there. That aside, I'm not too much in favour of adding the boost options. If you're not living in the US or the UK these options tempt at a simple solution for other countries that we can't provide. Boost only provides these two methods and won't for say the Netherlands or Vietnam. I can see the usefulness of the ISO date option. I don't know what others think of this. |
How about hiding boost's parser behind the existing options? Without boost my "30 Sep 2024" remains unparsable! Unfortunately dd MMM yyyy is becoming a defacto standard. |
Maybe this is what you mean: if the locale is en_XX try the ICU parser and if that fails try the boost::gregorian one. If both fail raise an error. |
I feel this is hacky. This current branch accepts @gjanssens feedback and will augment the existing dmy mdy ymd to use boost parser. See tests. |
I don't think it's any more hacky than overloading the dmy/mdy options to accept month names for English only. The non-hacky fix is to get the Unicode Consortium to fix the CLDR (good luck with that) or ICU to get their parser to recognize the 3-letter versions of those months (slightly more likely but much patience required). Another more general hack would be to introduce a correction table of some sort that might offer an alternative month abbreviation when there's something goofy in the CLDR. But this is workable enough for a first release. There's no point in expending further effort until users tell us that it falls short. |
I wonder, how much overlap is there between the regexes we have for d-m-y/m-d-y and the date_from_uk_string/date_from_us_string boost functions ? If the boost functions are a superset, we could just replace them instead of our regex based options. On the other hand, if there are date formats that our regex properly parses and boost doesn't, I would propose to first try the regex and then the boost function. |
They're complementary. The boost functions don't accept 2 digit years but parse wordy months. The current implementation uses heuristics for 2 digit years but fail wordy months. |
Ok. For me your implementation is good enough then. |
Thank you! Your "good enough" means a "good start" because I haven't completely tested enough combinations of invalid dates. There are some slight differences in the behaviours that will need tweaking before merging in. |
6b4d9dd
to
4342b24
Compare
Ok now I'm happy that the tests are complete. The exception for invalid dates eg 31-feb isn't "std::invalid_argument"; therefore the tests more modified to capture all exceptions. |
0dcb877
to
b28408d
Compare
80e260f
to
db7cf7c
Compare
1. Add dateformat "Locale" with ICU; uses current locale for date parsing. ICU's locale date parser may parse "3 May 2023" or "2024年9月13日" (LC_TIME=zh_TW.utf8) and maybe others. 2. Augment d-m-y m-d-y and y-m-d with boost UK/US/ISO parsers. This allows CSV import of dates with months as words as "30 Sep 2023" or "May 4, 1978" or "2023-Dec-25". Note boost parser cannot recognise 2-digit years, therefore "30 Sep 24" is invalid.
db7cf7c
to
ab641b3
Compare
another approach to #2010 -- add "Locale" which uses current icu parser with current locale. still has same fault as #2010.