-
-
Notifications
You must be signed in to change notification settings - Fork 768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICU-23031 Reinstate special case for "-u-va-posix" lost by ICU-22520 #3379
base: main
Are you sure you want to change the base?
Conversation
for (; index < localeID.size() && !_isTerminator(localeID[index]); index++) { | ||
if (index >= MAX_VARIANTS_LENGTH) { // same as length > MAX_VARIANTS_LENGTH | ||
for (std::string_view sub = localeID;;) { | ||
size_t next = sub.find_first_of(".@_-"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By searching for literal @
instead of calling a function, we break EBCDIC platforms.
For ICU 76, we got a contribution that supposedly made ICU work again on z/OS.
I don't think this has to be a blocker now, but it would be good to at least add a TODO about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By searching for literal
@
instead of calling a function, we break EBCDIC platforms.
Sure, but that's not a regression, this loop has been checking for a literal ASCII @
ever since that check was first introduced in 2001 by commit 8e5d162.
icu4c/source/common/uloc.cpp
Outdated
size_t next = sub.find_first_of(".@_-"); | ||
// For historical reasons, a trailing separator is included in the variant. | ||
bool finished = next == std::string_view::npos || next + 1 == sub.length(); | ||
size_t end = finished ? sub.length() : next; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ICU standard naming is "limit" for an exclusive end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
icu4c/source/common/uloc.cpp
Outdated
constexpr char tail[] = "-u-va-posix"; | ||
constexpr size_t length = sizeof tail - 1; | ||
if (localeID.length() == length && uprv_strnicmp(localeID.data(), tail, length) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only works if va-posix is the only keyword, right?
It doesn't work for something like en-US-u-co-search-va-posix-kc
I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a really hard time reading through the code and figuring out what it did. I'm mostly relying on the fact that Markus didn't squawk to convince myself that the code is okay.
I second Markus's concerns. Like him, I think a "TODO" would be sufficient to deal with the EBCDIC thing and that you should use "limit" instead of "end" for that one variable. I agree it'd be good if this were a little more robust against other locale IDs that have "va-posix" somewhere in their contents, but I think we're fairly unlikely to see that kind of thing and am okay just supporting "-u-va-posix` for now. Again, you might want to add a TODO.
Inside of locimp_forLanguageTag() in _appendKeywords() in uloc_tag.cpp there's a hardcoded special case for "-u-va-posix" which appends the "_POSIX" variant but this was missed during the refactoring made for ICU-22520 (there isn't any test case that covers this). So the call to locimp_forLanguageTag() did more than previously understood, but we still don't want to have to call that for every language tag that has BCP-47 extensions just in order to get to this special case. Instead, add a special case also to ulocimp_getSubtags(). For this to work nicely, the loop in _getVariant() that copies variants needs to be refactored so that it easily can break when encountering the start of any BCP-47 extension (which also has the welcome side-effect of making it more efficient, being able to append an entire variant at once to the output sink). This was broken by commit 678d5c1.
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOKTM
Inside of
locimp_forLanguageTag()
in_appendKeywords()
inuloc_tag.cpp
there's a hardcoded special case for-u-va-posix
which appends the_POSIX
variant but this was missed during the refactoring made for ICU-22520 (there isn't any test case that covers this).So the call to
locimp_forLanguageTag()
did more than previously understood, but we still don't want to have to call that for every language tag that has BCP-47 extensions just in order to get to this special case. Instead, add a special case also toulocimp_getSubtags().
For this to work nicely, the loop in
_getVariant()
that copies variants needs to be refactored so that it easily can break when encountering the start of any BCP-47 extension (which also has the welcome side-effect of making it more efficient, being able to append an entire variant at once to the output sink).This was broken by commit 678d5c1.
Checklist