Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[7392] [cpp] Fix IgnoreUnknownEnumStringValueInMap conformance tests #16479

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

antongrbin
Copy link
Contributor

@antongrbin antongrbin commented Apr 11, 2024

Motivation

This PR fixes failing JSON conformance tests for cpp with name IgnoreUnknownEnumStringValueInMap. Specifically, for the following input of type TestMapOfEnums:

{
  "enum_map": {
    "key1": "PROTOCOL",
    "key2": "UNKNOWN_ENUM_STRING_VALUE",
    "key3": "BUFFER"
  }
}

The JSON parser with ignore_unknown_fields=true should produce a map with 2 entries, skipping the unknown enum value under key2. The implementation before this PR outputs key2 with 0-th enum value (PROTOCOL) instead.

The JSON parsing spec was discussed in #7392.

Recent similar changes in other languages:

Changes

The main difficulty in implementing the fix is the fact that parser will first allocate a new map entry message and then consume the value. In this special case (unknown enum string value), we need to do it the other way around: first consume the value, then allocate the map entry if needed.

The PR initially contains 5 commits, with first 4 being noop unit tests and refactors:

  1. unit test that documents the mainline behavior (this test will be changed in the 5th commit)
  2. noop refactoring that simplifies the signature of ParseEnumFromStr
  3. noop refactoring that extract map key parsing into a separate function (we'll call it twice in 5th commit)
  4. noop refactoring that extracts parse map entry into a separate function (since it will get more logic)
  5. the commit that actually implements the fix and adjusts tests

@antongrbin antongrbin force-pushed the anton--7392--fix-cpp-test branch 9 times, most recently from 42ab466 to 20fb149 Compare April 19, 2024 20:08
@antongrbin antongrbin force-pushed the anton--7392--fix-cpp-test branch 2 times, most recently from 4dff04a to cfccbfb Compare May 17, 2024 09:00
@antongrbin antongrbin marked this pull request as ready for review May 17, 2024 10:31
@antongrbin antongrbin requested review from a team as code owners May 17, 2024 10:31
@antongrbin antongrbin requested review from haberman and removed request for a team May 17, 2024 10:31
@antongrbin
Copy link
Contributor Author

@haberman friendly ping to take a look :)

@antongrbin
Copy link
Contributor Author

antongrbin commented Jun 13, 2024

@sbenzaquen can you help me ping @haberman through internal channels :) I bet this is a GitHub notifications recall miss.

I am reaching out to you since you recently reviewed #16567

@antongrbin
Copy link
Contributor Author

@jskeet, since you worked with me on the linked issue #7392, can you help me get reviewer's attention here through internal channels? I suspect that they are not seeing GitHub notifications.

I am not blocked on this, just wanted to eventually remove the failing tests that I added.

@jskeet
Copy link
Contributor

jskeet commented Jun 25, 2024

I've pinged internally.

Previous impementation took mutable 'lex' which was misleading since
this function doesn't mutate the lex state in any way.
The only diff is in how we return lex error in unsupported map key type:

```
// before
return lex.Invalid("unsupported map key type");

// now
return key.loc.Invalid("unsupported map key type");
```

The motivation was to simplify the function call (1 parameter less), and
I believe that the two are equivalent:

https://github.com/protocolbuffers/protobuf/blob/0cda26d48db0241f6aca0e931ff0a31acea1a6c2/src/google/protobuf/json/internal/lexer.h#L113
@googleberg googleberg requested review from esrauchg and removed request for a team and haberman September 14, 2024 05:00
@googleberg
Copy link
Member

@esrauchg Can you take a look at this?

@esrauchg
Copy link

esrauchg commented Sep 16, 2024

Sorry @antongrbin for the delays in getting someone to look at this! I can help review/push this forward here from the protobuf team.

[Edit: removed questions that I realized weren't relevant after looking closer]

Copy link

@esrauchg esrauchg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! This is a great start, just one structuring comment and a few nits.

(void)Traits::WithFieldType(
map_field, [&lex, &enum_value](const Desc<Traits>& map_entry_desc) {
enum_value = ParseEnum<Traits>(lex, Traits::ValueField(map_entry_desc));
return absl::OkStatus();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think slightly cleaner to just use the WithFieldType status like this:

absl::optional<int32_t>> enum_value;

RETURN_IF_ERROR(Traits::WithFieldType(
    map_field, [&lex, &enum_value](const Desc<Traits>& map_entry_desc) {
  ASSIGN_OR_RETURN(enum_value, ParseEnum<Traits>(lex, Traits::ValueField(map_entry_desc));
  return absl::OkStatus();
}

// since we already parsed this value (we must not attempt to advance
// the lexer to parse the value here again).
Traits::SetEnum(Traits::ValueField(type), entry,
enum_value->value_or(0));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this should just be *enum_value (or enum_value->value()), you already checked that has_value is true above so it will be guaranteed to succeed.

It's academic in this case because the 0 value would never be used, but under proto2 semantics (where enums are closed, which is the case you are handling here) the default value for an enum is whatever is listed first in the definition, which isn't necessarily 0 value for the enum; there could either be a 0 value that is not listed first (then 0 is a legal value but not the default) or no listed 0 value at all (then its not a legal value at all).

Only under proto3 / open enums is it mandatory that the enum has a 0 value and that it is listed first (exactly to allow every enum to have a 0 default in proto3).

MapEntryNeedsEnumStringValueSpecialCase<Traits>(lex, map_field);
RETURN_IF_ERROR(needs_special_case.status());
if (*needs_special_case) {
return ParseMapEntryWithEnumStringValueWhenIgnoringUnknownFields<Traits>(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we structure this to be less drastically special cased? I'd rather it 'infect' the happy case if that avoids such a large diversion to handle the edge case.

Is it possible to structure this to always parse map key, parse map value, decide whether to err or ok if the value can't be parsed due to it being unknown enum, then construct the message?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants