-
-
Notifications
You must be signed in to change notification settings - Fork 766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICU-22834 MF2: make tests compliant with schema and update spec tests #3063
Conversation
2e5394d
to
c00e4b5
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
c00e4b5
to
af96c7a
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
d8a016d
to
825cdf9
Compare
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
96e081f
to
c3ea7a7
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
c3ea7a7
to
ca1b786
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
ca1b786
to
cf91e74
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
cf91e74
to
612387f
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
I can't request reviews, but: @markusicu @srl295 @echeran - and @mihnita might want to take a look at the Java changes. |
I added another commit to this PR to update the spec tests again, which meant I had to pull in some more commits that were initially in #3092 (draft PR) in order to make the tests pass. |
Also, I'll be on vacation from now until September 9, so I won't see any review comments until I return. |
default: { | ||
// Should be unreachable | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default: { | |
// Should be unreachable | |
return 0; | |
default: { | |
// Should be unreachable | |
U_ASSERT(isDigit(c)); | |
return 0; |
should this handle other numbering systems though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and not to bikeshed on this part but there is ufmt_digitvalue()
and you could ignore values <0 or ≥10 (it parses hex)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this handle other numbering systems though?
No, because it's operating on a Number Operand as defined in the function registry spec; it's not a localized number.
and not to bikeshed on this part but there is ufmt_digitvalue() and you could ignore values <0 or ≥10 (it parses hex)
I tried just now to use that, but I realized that ufmt_digitvalue()
is part of the io
library and io
depends on i18n
, so that would be a circular dependency. I guess it could be moved into i18n
, but I'm hesitant to do a refactor like that just to avoid duplicating a relatively small function. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize it was in io, good call. fine as is then.
default: { | ||
// Should be unreachable | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and not to bikeshed on this part but there is ufmt_digitvalue()
and you could ignore values <0 or ≥10 (it parses hex)
as a followon i'd move the number parsing into ufmt_cmn.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be more cautious about the defines in the public header, otherwise looking good.
There is an email thread on how to organize the test data. Because some of the json files here are from cldr "as is" (or at least they should be). The cldr test files are copied to icu by an ant script. If you don't know what thread I'm talking about (with vacation and all it is understandable if it was lost), tell me and I'll ping you by email with a link. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
Am I wrong, of there was another PR that was kind of similar, but not really?
That I reviewed.
I can't find it now.
icu4j/main/core/src/main/java/com/ibm/icu/message2/MFParser.java
Outdated
Show resolved
Hide resolved
icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/Sources.java
Outdated
Show resolved
Hide resolved
Do you think it would be OK to handle that in a separate PR? |
Rebasing now, since I think all the line comments have been addressed. |
9bd9556
to
d2edf89
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update LGTM
d2edf89
to
1f99262
Compare
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
@echeran It looks like the "enforce-all-checks" check failed, but that was the only check that failed. I think this is the same issue described in the icu-team email thread from yesterday? |
retriggered, clean now. The ironic thing is that I was wondering if CLDR needs one of the same enforce-all-checks but now i'm not so sure! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good except I have a couple of questions / suggestions. Otherwise, I'm happy to rubber stamp since @mihnita is on board.
return result; | ||
} | ||
|
||
static double parseNumberLiteral(const FormattedPlaceholder& input, UErrorCode& errorCode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions from @sffc
- Use function from
double-conversion.h
- Use function from
number_decimalquantity.h
- Use function from
util.h
calledparseNumber
to replace the helper methods that parse the integer and fractional portions.
Either option 1 or 2 should be sufficient to do what we need here. Option 1 gives a double
, option 2 gives a DecimalQuantity
. At least option 1 should be able to handle sign, decimal point, and exponent, and probably option 2. Option 3 is helpful but doesn't replace as much code as the other 2 options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried both (1) and (2), and the problem is that there are some strings that both functions accept, which are not valid number literals according to the MF2 grammar: for example, 01
(leading zero with no decimal point), +1
(leading '+'), .1
(decimal point with no leading 0), and 1.
(trailing decimal point).
Since these cases are relatively simple to check for, I just added a check before calling into StringToDouble
(option 1) - see 176a232.
I didn't like (3) since calling parseNumber
to parse the decimal part still means having to do math to add the two parts together, so it doesn't seem to simplify the code that much.
Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Option 1 seemed the most likely & directly applicable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
After you squash, I'm happy to reapprove, and also to click the merge button on your behalf if need be.
return result; | ||
} | ||
|
||
static double parseNumberLiteral(const FormattedPlaceholder& input, UErrorCode& errorCode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Option 1 seemed the most likely & directly applicable.
This also updates the spec tests from the current version of the MFWG repository and removes some duplicate tests. Spec tests now reflect the message-format-wg repo as of unicode-org/message-format-wg@5612f3b It also updates both the ICU4C and ICU4J parsers to follow the current test schema in the conformance repository. This includes adding code to both parsers to allow `src` to be either a single string or an array of strings (per unicode-org/conformance#255 ), and eliminating `srcs` in tests. It also includes other changes to make updated spec tests pass: ICU4C: Allow trailing whitespace for complex messages, due to spec change ICU4C: Parse number literals correctly in Number::format ICU4J: Allow trailing whitespace after complex body, per spec change ICU4C: Fix bug that was assuming an .input variable can't have a reserved annotation ICU4C: Fix bug where unsupported '.i' was parsed as an '.input' ICU4C/ICU4J: Handle markup with space after the initial left curly brace ICU4C: Check for duplicate variant errors ICU4C/ICU4J: Handle leading whitespace in complex messages ICU4J: Treat whitespace after .input keyword as optional ICU4J: Don't format unannotated number literals as numbers
cdc7238
to
6441f29
Compare
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
@echeran Squashed -- I think you don't need to reapprove since nothing changed, but if you could click the merge button, that would be great! |
Thank you! |
This PR changes the code in the ICU4C and ICU4J tests that reads data-driven tests in JSON format to follow the test schema in the conformance repo, as well as updating the spec tests to the version in the spec repo.
Due to the changed spec tests, some non-test code changes were necessary to get the tests to pass:
Number::format
(this wasn't being done according to spec before, and previously there were no tests to expose that).input
declarations, which had assumed the annotation can't be a reserved annotation.i
was a parse error rather than an unsupported keyword.input
as optionalNote that some manual changes to the spec tests are necessary; ICU4J currently returns best-effort output rather than throwing exceptions for resolution errors, so tests where a resolution error (e.g.
unknown-function
orunsupported-statement
) is expected need to be annotated with anignoreJava
property. This can be removed once the error handling story has been resolved (see MFWG issue 782).The more interesting changes needed to parse the test files according to the schema include:
name
andvalue
properties, rather than as an object)srcs
property to represent multi-line messages, and instead, allowingsrc
to be either a single string or array of strings (consistent with this schema change)This also made it possible to move all the test filenames into one file in the ICU4J tests (
CoreTest.java
) and delete the others, since tests are handled uniformly now; analogously, removing extra methods in the ICU4C test reader.Finally, the non-spec tests needed to be changed themselves; the main changes were to change the format of the
params
property (mentioned above) and changingerrors
toexpErrors
. In addition, one file (icu-test-selectors.json
) needed to be "expanded out" because the schema doesn't support theshared
andvariations
properties.Checklist