-
Notifications
You must be signed in to change notification settings - Fork 15.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[7392] [cpp] Fix IgnoreUnknownEnumStringValueInMap conformance tests #16479
base: main
Are you sure you want to change the base?
Conversation
42ab466
to
20fb149
Compare
4dff04a
to
cfccbfb
Compare
@haberman friendly ping to take a look :) |
@sbenzaquen can you help me ping @haberman through internal channels :) I bet this is a GitHub notifications recall miss. I am reaching out to you since you recently reviewed #16567 |
I've pinged internally. |
Previous impementation took mutable 'lex' which was misleading since this function doesn't mutate the lex state in any way.
The only diff is in how we return lex error in unsupported map key type: ``` // before return lex.Invalid("unsupported map key type"); // now return key.loc.Invalid("unsupported map key type"); ``` The motivation was to simplify the function call (1 parameter less), and I believe that the two are equivalent: https://github.com/protocolbuffers/protobuf/blob/0cda26d48db0241f6aca0e931ff0a31acea1a6c2/src/google/protobuf/json/internal/lexer.h#L113
cfccbfb
to
724acf1
Compare
@esrauchg Can you take a look at this? |
Sorry @antongrbin for the delays in getting someone to look at this! I can help review/push this forward here from the protobuf team. [Edit: removed questions that I realized weren't relevant after looking closer] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! This is a great start, just one structuring comment and a few nits.
(void)Traits::WithFieldType( | ||
map_field, [&lex, &enum_value](const Desc<Traits>& map_entry_desc) { | ||
enum_value = ParseEnum<Traits>(lex, Traits::ValueField(map_entry_desc)); | ||
return absl::OkStatus(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think slightly cleaner to just use the WithFieldType status like this:
absl::optional<int32_t>> enum_value;
RETURN_IF_ERROR(Traits::WithFieldType(
map_field, [&lex, &enum_value](const Desc<Traits>& map_entry_desc) {
ASSIGN_OR_RETURN(enum_value, ParseEnum<Traits>(lex, Traits::ValueField(map_entry_desc));
return absl::OkStatus();
}
// since we already parsed this value (we must not attempt to advance | ||
// the lexer to parse the value here again). | ||
Traits::SetEnum(Traits::ValueField(type), entry, | ||
enum_value->value_or(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this should just be *enum_value
(or enum_value->value()
), you already checked that has_value is true above so it will be guaranteed to succeed.
It's academic in this case because the 0 value would never be used, but under proto2 semantics (where enums are closed, which is the case you are handling here) the default value for an enum is whatever is listed first in the definition, which isn't necessarily 0 value for the enum; there could either be a 0 value that is not listed first (then 0 is a legal value but not the default) or no listed 0 value at all (then its not a legal value at all).
Only under proto3 / open enums is it mandatory that the enum has a 0 value and that it is listed first (exactly to allow every enum to have a 0 default in proto3).
MapEntryNeedsEnumStringValueSpecialCase<Traits>(lex, map_field); | ||
RETURN_IF_ERROR(needs_special_case.status()); | ||
if (*needs_special_case) { | ||
return ParseMapEntryWithEnumStringValueWhenIgnoringUnknownFields<Traits>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we structure this to be less drastically special cased? I'd rather it 'infect' the happy case if that avoids such a large diversion to handle the edge case.
Is it possible to structure this to always parse map key, parse map value, decide whether to err or ok if the value can't be parsed due to it being unknown enum, then construct the message?
Motivation
This PR fixes failing JSON conformance tests for cpp with name
IgnoreUnknownEnumStringValueInMap
. Specifically, for the following input of type TestMapOfEnums:The JSON parser with
ignore_unknown_fields=true
should produce a map with 2 entries, skipping the unknown enum value underkey2
. The implementation before this PR outputskey2
with 0-th enum value (PROTOCOL
) instead.The JSON parsing spec was discussed in #7392.
Recent similar changes in other languages:
Changes
The main difficulty in implementing the fix is the fact that parser will first allocate a new map entry message and then consume the value. In this special case (unknown enum string value), we need to do it the other way around: first consume the value, then allocate the map entry if needed.
The PR initially contains 5 commits, with first 4 being noop unit tests and refactors: