csv import date: Add dateformat "Locale" to pick current locale #2011

christopherlam · 2024-09-01T16:29:18Z

another approach to #2010 -- add "Locale" which uses current icu parser with current locale. still has same fault as #2010.

gjanssens · 2024-09-11T11:11:31Z

I think this approach will actually get you closest to what you want in a universal way, provided the locale is set properly. While icu in Australian locale doesn't do what you want it to do, you could set LC_TIME=en_US for gnucash en_US does parse it properly. I don't know if we have to something extra in the code to have icu pick up this environment variable or whether it understands its own set of variables (I know for example postgres allows for icu specific parameters, but I don't know whether this is icu or postgres specific). The added advantage would be that each user could override LC_TIME as they see fit. So far only few requests for a date format outside of what we offer have been made. So a workaround that requires setting an environment file may be sufficient so far.

As to your remark that icu is not properly parsing "Sep" in the Australian locale, it looks like this was an intentional change. Apparently it's not considered as set in stone but it will need someone (or a few people) to offer enough "evidence" to warrant the change back from "Sept" to "Sep". Likewise for "June/Jun" and "July/Jul".

christopherlam · 2024-09-11T15:31:49Z

Ok. I've just tried with all ICU english locales; all "dmy" outputs and expects "Sept" and all "mdy" outputs and expects "Sep" ☹️

gjanssens · 2024-09-11T18:39:26Z

I have reworked your #2010 experiment a little to test for locales that can handle your Australian dates. On my system three remained after testing a date in each month:

76: 805 locales available. Testing 12 dates.
76: 08 Jan 2021 - available locales:  af af_NA af_ZA asa asa_TZ bem bem_ZM en_001 en_150 en_AE en_AG en_AI en_AT en_AU en_BB en_BE en_BM en_BS en_BW en_CC en_CH en_CK en_CM en_CX en_CY en_DE en_DG en_DK en_DM en_ER en_FI en_FJ en_FK en_FM en_GB en_GD en_GG en_GH en_GI en_GM en_GY en_HK en_IE en_IL en_IM en_IO en_JE en_JM en_KE en_KI en_KN en_KY en_LC en_LR en_LS en_MG en_MO en_MS en_MT en_MU en_MW en_MY en_NA en_NF en_NG en_NL en_NR en_NU en_PG en_PN en_PW en_RW en_SB en_SC en_SD en_SE en_SG en_SH en_SI en_SL en_SS en_SX en_SZ en_TC en_TK en_TO en_TT en_TV en_TZ en_UG en_VC en_VG en_VU en_WS en_ZA en_ZM fr_MA fy fy_NL ia ia_001 id id_ID jmc jmc_TZ jv jv_ID kde kde_TZ kea kea_CV ksb ksb_TZ lg lg_UG luy luy_KE mer mer_KE ms ms_BN ms_ID ms_MY ms_SG mt mt_MT naq naq_NA nl nl_AW nl_BE nl_BQ nl_CW nl_NL nl_SR nl_SX rwk rwk_TZ sq sq_AL sq_MK sq_XK su su_Latn su_Latn_ID sv sv_AX sv_FI sv_SE sw sw_CD sw_KE sw_TZ sw_UG vun vun_TZ xog xog_UG
76: 08 Feb 2021 - available locales:  af af_NA af_ZA asa asa_TZ bem bem_ZM en_001 en_150 en_AE en_AG en_AI en_AT en_AU en_BB en_BE en_BM en_BS en_BW en_CC en_CH en_CK en_CM en_CX en_CY en_DE en_DG en_DK en_DM en_ER en_FI en_FJ en_FK en_FM en_GB en_GD en_GG en_GH en_GI en_GM en_GY en_HK en_IE en_IL en_IM en_IO en_JE en_JM en_KE en_KI en_KN en_KY en_LC en_LR en_LS en_MG en_MO en_MS en_MT en_MU en_MW en_MY en_NA en_NF en_NG en_NL en_NR en_NU en_PG en_PN en_PW en_RW en_SB en_SC en_SD en_SE en_SG en_SH en_SI en_SL en_SS en_SX en_SZ en_TC en_TK en_TO en_TT en_TV en_TZ en_UG en_VC en_VG en_VU en_WS en_ZA en_ZM fy fy_NL ia ia_001 id id_ID jmc jmc_TZ jv jv_ID kde kde_TZ kea kea_CV ksb ksb_TZ lg lg_UG luy luy_KE mer mer_KE ms ms_BN ms_ID ms_MY ms_SG naq naq_NA nl nl_AW nl_BE nl_BQ nl_CW nl_NL nl_SR nl_SX rwk rwk_TZ sv sv_AX sv_FI sv_SE sw sw_CD sw_KE sw_TZ sw_UG vun vun_TZ xog xog_UG
76: 08 Mar 2021 - available locales:  en_001 en_150 en_AE en_AG en_AI en_AT en_AU en_BB en_BE en_BM en_BS en_BW en_CC en_CH en_CK en_CM en_CX en_CY en_DE en_DG en_DK en_DM en_ER en_FI en_FJ en_FK en_FM en_GB en_GD en_GG en_GH en_GI en_GM en_GY en_HK en_IE en_IL en_IM en_IO en_JE en_JM en_KE en_KI en_KN en_KY en_LC en_LR en_LS en_MG en_MO en_MS en_MT en_MU en_MW en_MY en_NA en_NF en_NG en_NL en_NR en_NU en_PG en_PN en_PW en_RW en_SB en_SC en_SD en_SE en_SG en_SH en_SI en_SL en_SS en_SX en_SZ en_TC en_TK en_TO en_TT en_TV en_TZ en_UG en_VC en_VG en_VU en_WS en_ZA en_ZM ia ia_001 id id_ID jv jv_ID kea kea_CV lg lg_UG luy luy_KE naq naq_NA xog xog_UG
76: 08 Apr 2021 - available locales:  en_001 en_150 en_AE en_AG en_AI en_AT en_AU en_BB en_BE en_BM en_BS en_BW en_CC en_CH en_CK en_CM en_CX en_CY en_DE en_DG en_DK en_DM en_ER en_FI en_FJ en_FK en_FM en_GB en_GD en_GG en_GH en_GI en_GM en_GY en_HK en_IE en_IL en_IM en_IO en_JE en_JM en_KE en_KI en_KN en_KY en_LC en_LR en_LS en_MG en_MO en_MS en_MT en_MU en_MW en_MY en_NA en_NF en_NG en_NL en_NR en_NU en_PG en_PN en_PW en_RW en_SB en_SC en_SD en_SE en_SG en_SH en_SI en_SL en_SS en_SX en_SZ en_TC en_TK en_TO en_TT en_TV en_TZ en_UG en_VC en_VG en_VU en_WS en_ZA en_ZM ia ia_001 id id_ID jv jv_ID luy luy_KE naq naq_NA
76: 08 May 2021 - available locales:  en_001 en_150 en_AE en_AG en_AI en_AT en_AU en_BB en_BE en_BM en_BS en_BW en_CC en_CH en_CK en_CM en_CX en_CY en_DE en_DG en_DK en_DM en_ER en_FI en_FJ en_FK en_FM en_GB en_GD en_GG en_GH en_GI en_GM en_GY en_HK en_IE en_IL en_IM en_IO en_JE en_JM en_KE en_KI en_KN en_KY en_LC en_LR en_LS en_MG en_MO en_MS en_MT en_MU en_MW en_MY en_NA en_NF en_NG en_NL en_NR en_NU en_PG en_PN en_PW en_RW en_SB en_SC en_SD en_SE en_SG en_SH en_SI en_SL en_SS en_SX en_SZ en_TC en_TK en_TO en_TT en_TV en_TZ en_UG en_VC en_VG en_VU en_WS en_ZA en_ZM naq naq_NA
76: 08 Jun 2021 - available locales:  en_001 en_150 en_AE en_AG en_AI en_AT en_BB en_BE en_BM en_BS en_BW en_CC en_CH en_CK en_CM en_CX en_CY en_DE en_DG en_DK en_DM en_ER en_FI en_FJ en_FK en_FM en_GB en_GD en_GG en_GH en_GI en_GM en_GY en_HK en_IE en_IL en_IM en_IO en_JE en_JM en_KE en_KI en_KN en_KY en_LC en_LR en_LS en_MG en_MO en_MS en_MT en_MU en_MW en_MY en_NA en_NF en_NG en_NL en_NR en_NU en_PG en_PN en_PW en_RW en_SB en_SC en_SD en_SE en_SG en_SH en_SI en_SL en_SS en_SX en_SZ en_TC en_TK en_TO en_TT en_TV en_TZ en_UG en_VC en_VG en_VU en_WS en_ZA en_ZM naq naq_NA
76: 08 Jul 2021 - available locales:  en_001 en_150 en_AE en_AG en_AI en_AT en_BB en_BE en_BM en_BS en_BW en_CC en_CH en_CK en_CM en_CX en_CY en_DE en_DG en_DK en_DM en_ER en_FI en_FJ en_FK en_FM en_GB en_GD en_GG en_GH en_GI en_GM en_GY en_HK en_IE en_IL en_IM en_IO en_JE en_JM en_KE en_KI en_KN en_KY en_LC en_LR en_LS en_MG en_MO en_MS en_MT en_MU en_MW en_MY en_NA en_NF en_NG en_NL en_NR en_NU en_PG en_PN en_PW en_RW en_SB en_SC en_SD en_SE en_SG en_SH en_SI en_SL en_SS en_SX en_SZ en_TC en_TK en_TO en_TT en_TV en_TZ en_UG en_VC en_VG en_VU en_WS en_ZA en_ZM naq naq_NA
76: 08 Aug 2021 - available locales:  en_001 en_150 en_AE en_AG en_AI en_AT en_BB en_BE en_BM en_BS en_BW en_CC en_CH en_CK en_CM en_CX en_CY en_DE en_DG en_DK en_DM en_ER en_FI en_FJ en_FK en_FM en_GB en_GD en_GG en_GH en_GI en_GM en_GY en_HK en_IE en_IL en_IM en_IO en_JE en_JM en_KE en_KI en_KN en_KY en_LC en_LR en_LS en_MG en_MO en_MS en_MT en_MU en_MW en_MY en_NA en_NF en_NG en_NL en_NR en_NU en_PG en_PN en_PW en_RW en_SB en_SC en_SD en_SE en_SG en_SH en_SI en_SL en_SS en_SX en_SZ en_TC en_TK en_TO en_TT en_TV en_TZ en_UG en_VC en_VG en_VU en_WS en_ZA en_ZM naq naq_NA
76: 08 Sep 2021 - available locales:  en_AE naq naq_NA
76: 08 Oct 2021 - available locales:  en_AE naq naq_NA
76: 08 Nov 2021 - available locales:  en_AE naq naq_NA
76: 08 Dec 2021 - available locales:  en_AE naq naq_NA
76: 3 locales left, checked in 0.217325 seconds:
76:  en_AE naq naq_NA

So what happens if you try en_AE naq or naq_NA? You may need to test both the short and the medium format.

christopherlam · 2024-09-12T00:28:54Z

So what happens if you try en_AE naq or naq_NA? You may need to test both the short and the medium format.

Looks good however in my setup the en_AE doesn't exist. I don't know why. How about combining this ICU approach with #2015 ?

christopherlam · 2024-09-12T12:33:23Z

I think this is ready. Maybe "Locale" should be the first choice to match the Currency format (instead of the current y-m-d, or this branch's boost's UK parser.

The icu formatter and calendar objects are generated only once.

Here's small issue with ICU locale: the csv save settings will store "Locale" but won't specify which locale.

gjanssens · 2024-09-12T15:43:52Z

Here's small issue with ICU locale: the csv save settings will store "Locale" but won't specify which locale.

You could argue that the exact locale is not set in the import preview itself and so we shouldn't save it. It's probably not the most user friendly view, but a pragmatic one in this case.

Saving the actual current locale even though it's not explicitly set can equally cause unexpected behaviour.
To really solve this, we will have to offer a better way to select actual locales in the importer. As we established that's a larger effort than we currently can or want to spend.

What is a problem IMO is that your added dropdown options alter the meanings of currently saved presets. That should be avoided. To solve this, you could add the new options below the ones that where already there.

That aside, I'm not too much in favour of adding the boost options. If you're not living in the US or the UK these options tempt at a simple solution for other countries that we can't provide. Boost only provides these two methods and won't for say the Netherlands or Vietnam. I can see the usefulness of the ISO date option. I don't know what others think of this.

christopherlam · 2024-09-12T16:16:06Z

How about hiding boost's parser behind the existing options? Without boost my "30 Sep 2024" remains unparsable! Unfortunately dd MMM yyyy is becoming a defacto standard.

jralls · 2024-09-13T00:00:31Z

How about hiding boost's parser behind the existing options? Without boost my "30 Sep 2024" remains unparsable!

Maybe this is what you mean: if the locale is en_XX try the ICU parser and if that fails try the boost::gregorian one. If both fail raise an error.

christopherlam · 2024-09-13T00:30:51Z

How about hiding boost's parser behind the existing options? Without boost my "30 Sep 2024" remains unparsable!

Maybe this is what you mean: if the locale is en_XX try the ICU parser and if that fails try the boost::gregorian one. If both fail raise an error.

I feel this is hacky. This current branch accepts @gjanssens feedback and will augment the existing dmy mdy ymd to use boost parser. See tests.

jralls · 2024-09-13T02:59:46Z

I don't think it's any more hacky than overloading the dmy/mdy options to accept month names for English only. The non-hacky fix is to get the Unicode Consortium to fix the CLDR (good luck with that) or ICU to get their parser to recognize the 3-letter versions of those months (slightly more likely but much patience required).

Another more general hack would be to introduce a correction table of some sort that might offer an alternative month abbreviation when there's something goofy in the CLDR.

But this is workable enough for a first release. There's no point in expending further effort until users tell us that it falls short.

gjanssens · 2024-09-13T06:48:43Z

I wonder, how much overlap is there between the regexes we have for d-m-y/m-d-y and the date_from_uk_string/date_from_us_string boost functions ?

If the boost functions are a superset, we could just replace them instead of our regex based options. On the other hand, if there are date formats that our regex properly parses and boost doesn't, I would propose to first try the regex and then the boost function.

christopherlam · 2024-09-13T06:55:36Z

They're complementary. The boost functions don't accept 2 digit years but parse wordy months. The current implementation uses heuristics for 2 digit years but fail wordy months.

gjanssens · 2024-09-13T12:47:10Z

Ok. For me your implementation is good enough then.

christopherlam · 2024-09-13T13:47:40Z

Thank you! Your "good enough" means a "good start" because I haven't completely tested enough combinations of invalid dates. There are some slight differences in the behaviours that will need tweaking before merging in.

christopherlam · 2024-09-13T14:36:56Z

Ok now I'm happy that the tests are complete. The exception for invalid dates eg 31-feb isn't "std::invalid_argument"; therefore the tests more modified to capture all exceptions.

libgnucash/engine/gnc-datetime.cpp

libgnucash/engine/test/gtest-gnc-datetime.cpp

1. Add dateformat "Locale" with ICU; uses current locale for date parsing. ICU's locale date parser may parse "3 May 2023" or "2024年9月13日" (LC_TIME=zh_TW.utf8) and maybe others. 2. Augment d-m-y m-d-y and y-m-d with boost UK/US/ISO parsers. This allows CSV import of dates with months as words as "30 Sep 2023" or "May 4, 1978" or "2023-Dec-25". Note boost parser cannot recognise 2-digit years, therefore "30 Sep 24" is invalid.

christopherlam closed this Sep 8, 2024

christopherlam deleted the csv-date-locale-2 branch September 8, 2024 14:16

christopherlam restored the csv-date-locale-2 branch September 11, 2024 11:44

christopherlam reopened this Sep 11, 2024

This was referenced Sep 11, 2024

Csv import: GncDateFormat ->date_locale #2010

Closed

[gnc-datetime] add boost UK/US/ISO date parsers #2015

Closed

Bug 797724 - Add support for custom date formats when importing CSV transactions #1710

Closed

christopherlam force-pushed the csv-date-locale-2 branch from 47e10e5 to 9afa05d Compare September 11, 2024 23:46

christopherlam force-pushed the csv-date-locale-2 branch 3 times, most recently from e7671b9 to ead1fe8 Compare September 12, 2024 11:50

christopherlam marked this pull request as ready for review September 12, 2024 12:25

christopherlam force-pushed the csv-date-locale-2 branch from ead1fe8 to aef7dfd Compare September 12, 2024 15:33

christopherlam closed this Sep 13, 2024

christopherlam reopened this Sep 13, 2024

christopherlam force-pushed the csv-date-locale-2 branch from 6b4d9dd to 4342b24 Compare September 13, 2024 14:35

christopherlam force-pushed the csv-date-locale-2 branch 4 times, most recently from 0dcb877 to b28408d Compare September 13, 2024 14:56

gjanssens reviewed Sep 13, 2024

View reviewed changes

libgnucash/engine/gnc-datetime.cpp Outdated Show resolved Hide resolved

gjanssens requested changes Sep 13, 2024

View reviewed changes

libgnucash/engine/test/gtest-gnc-datetime.cpp Outdated Show resolved Hide resolved

libgnucash/engine/test/gtest-gnc-datetime.cpp Show resolved Hide resolved

jralls reviewed Sep 13, 2024

View reviewed changes

libgnucash/engine/test/gtest-gnc-datetime.cpp Outdated Show resolved Hide resolved

christopherlam force-pushed the csv-date-locale-2 branch from 80e260f to db7cf7c Compare September 14, 2024 02:07

christopherlam force-pushed the csv-date-locale-2 branch from db7cf7c to ab641b3 Compare September 14, 2024 02:14

code-gnucash-org merged commit ab641b3 into Gnucash:stable Sep 14, 2024
4 checks passed

christopherlam deleted the csv-date-locale-2 branch September 14, 2024 03:20

code-gnucash-org temporarily deployed to github-pages September 14, 2024 03:30 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csv import date: Add dateformat "Locale" to pick current locale #2011

csv import date: Add dateformat "Locale" to pick current locale #2011

christopherlam commented Sep 1, 2024

gjanssens commented Sep 11, 2024

christopherlam commented Sep 11, 2024

gjanssens commented Sep 11, 2024

christopherlam commented Sep 12, 2024

christopherlam commented Sep 12, 2024

gjanssens commented Sep 12, 2024

christopherlam commented Sep 12, 2024

jralls commented Sep 13, 2024

christopherlam commented Sep 13, 2024

jralls commented Sep 13, 2024

gjanssens commented Sep 13, 2024

christopherlam commented Sep 13, 2024

gjanssens commented Sep 13, 2024

christopherlam commented Sep 13, 2024

christopherlam commented Sep 13, 2024

csv import date: Add dateformat "Locale" to pick current locale #2011

csv import date: Add dateformat "Locale" to pick current locale #2011

Conversation

christopherlam commented Sep 1, 2024

gjanssens commented Sep 11, 2024

christopherlam commented Sep 11, 2024

gjanssens commented Sep 11, 2024

christopherlam commented Sep 12, 2024

christopherlam commented Sep 12, 2024

gjanssens commented Sep 12, 2024

christopherlam commented Sep 12, 2024

jralls commented Sep 13, 2024

christopherlam commented Sep 13, 2024

jralls commented Sep 13, 2024

gjanssens commented Sep 13, 2024

christopherlam commented Sep 13, 2024

gjanssens commented Sep 13, 2024

christopherlam commented Sep 13, 2024

christopherlam commented Sep 13, 2024