Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Csv import: GncDateFormat ->date_locale #2010

Closed
wants to merge 6 commits into from

Conversation

christopherlam
Copy link
Contributor

offshoot #1710 -- here's attempt.

all GncDateFormat changed to use locales. The locale combobox is currently 799 items long!

but it doesn't seem to parse all dates. see example with debugging code. looks like ICU type issue which cannot understand "Sep" yet can understand "Apr".

image

@christopherlam christopherlam force-pushed the csv-date-locale branch 3 times, most recently from 2c8ca3d to 4cfa4a8 Compare September 1, 2024 15:05
@jralls
Copy link
Member

jralls commented Sep 2, 2024

It didn't like Jun either. Does changing the locale change anything?

@christopherlam
Copy link
Contributor Author

It didn't like Jun either. Does changing the locale change anything?

not really! "Jun" "Jul" are also off. "September" is accepted.

@christopherlam
Copy link
Contributor Author

oops

with en_US.utf8 "Sep":fail "Jun":fail "Jul":fail -- I guess it expects m/d/y
with en_AU.utf8 "Sep":fail "Jun":fail "Jul":fail
with en_GB.utf8 "Sep":fail "Jun":pass "Jul":pass
with en_CA.utf8 similar to en_GB

@jralls
Copy link
Member

jralls commented Sep 3, 2024

It's partly explained here: en_US: Jun, Jul, Sep; en_AU: June, July, Sept; en_GB: Jun, Jul, Sept; en_CA: Jun, Jul, Sep.

That says it should work for US and CA. A decent parser should be able to deal with dd MMM YYYY regardless of the locale's normal output format.

if (U_FAILURE(status))
throw std::invalid_argument ("Cannot parse string");

tuple = std::make_tuple<DateFormatPtr, CalendarPtr>(std::move(formatter), std::move(calendar));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot to add the tuple to the cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't: see the magic & in auto& ☺️
I did test via printfs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The std::move is slightly beyond my current understanding but it's a requirement for compilation to succeed. Chatgpt did suggest it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the magic & in auto&

Very clever. I did miss that indeed.

The std::move is slightly beyond my current understanding

[std::tuple](http://en.cppreference.com/w/cpp/utility/tuple)<unwrap_decay_t<Types>...> make_tuple(Types&&... args)

In template context Types&& means forwarding reference. It needs either a reference or an lvalue. A named variable is an rvalue so it needs to be cast into an lvalue. std::move does that. std::move is apparently confusing. Maybe the committee should have called it lvalue_cast instead.

@flywire
Copy link
Contributor

flywire commented Sep 6, 2024

It's partly explained here: en_AU: June, July, Sept

Seems it is based on poor information. I've never seen that month format in Australia, normally numeric or 3 character alpha.

…parses

test 500 dates in 797 locales filtered down to 796 locales in 3.37s
@christopherlam
Copy link
Contributor Author

Ok let's kill this stinky hack in favour of #2011

@christopherlam christopherlam deleted the csv-date-locale branch September 11, 2024 12:45
@gjanssens
Copy link
Member

Ok.

For posterity, there was a bug in the last attempt, where you try two parsers. If the short parser failed, the medium parser would always fail as well because the parsePos was not reset. Due to this the medium parser always had only a partial string left to parse. By resetting parsePos before attempting the medium parser, I got better results.

Still all these parse attempts are costly. In the ideal case - where only a few parsers are valid - this is ok. However if I test on a string that's valid in many parsers the time to run all the tests add up quickly.

For example with 500 date strings like '06/09/2021', over 600 parsers return a valid date, and the time it takes varies from 2,5 seconds to 3,5 seconds on my system. This can likely be optimized some more by eliminating failing short parsers from the cache. However I don't know if this "intelligent" guessing is really worth it in the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants