Javascript string decode leaves parser in a bad state if there are invalid or truncated UTF-8 characters #23

afarnsworth-valve · 2020-09-11T18:40:00Z

What version of protobuf and what language are you using?
Version: v3.7.1.
Language: Javascript

What operating system (Linux, Windows, ...) and version?
Chrome

What runtime / compiler are you using (e.g., python version or gcc version)
Webpack/Typescript/Chrome

What did you do?
When a protobuf with a string field is parsed and that string field ends with an incomplete UTF8 character, readString in the binary decoder will advance the read cursor past the end of the string field, causing the rest of the message to fail to parse.

What did you expect to see
Possibly an assert in the string reader, but allow rest of message to parse correctly despite invalid content in the string field.

What did you see instead?
Usually an assert is thrown, but the error is bogus as the binary reader has advanced into field data and is trying to interpret it as field number/metadata. The error does not identify the string field that actually caused the problem.

The function is located here, in jspb.BinaryDecoder.prototype.readString:
https://github.com/protocolbuffers/protobuf/blob/master/js/binary/decoder.js#L830

The problem arises due to a combination of advancing the cursor without bounds checking:

    } else if (c < 240) { // UTF-8 with three bytes.
      var c2 = bytes[cursor++];
      var c3 = bytes[cursor++];

which can allow for cursor to advance past end. At the exit of the function, the reader's internal cursor is set:

  this.cursor_ = cursor;

At this point, cursor is a byte or two into the next field, and parsing fails.

Anything else we should know about your project / environment
The other end of this communication channel is using the C++ protobuf library, which does optionally warn but ultimately allows for potentially invalid utf-8 data to be serialized.

The text was updated successfully, but these errors were encountered:

dibenede · 2022-09-02T22:55:50Z

This is a known issue and we need to upstream the fix.

deannagarcia assigned perezd Oct 22, 2020

deannagarcia added the javascript label Oct 22, 2020

acozzette transferred this issue from protocolbuffers/protobuf May 16, 2022

dibenede unassigned perezd Sep 2, 2022

dibenede added bug Something isn't working triaged Issue has been triaged labels Sep 2, 2022

dibenede added the port-fix label Sep 2, 2022

lukesandberg self-assigned this Oct 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Javascript string decode leaves parser in a bad state if there are invalid or truncated UTF-8 characters #23

Javascript string decode leaves parser in a bad state if there are invalid or truncated UTF-8 characters #23

afarnsworth-valve commented Sep 11, 2020

dibenede commented Sep 2, 2022

Javascript string decode leaves parser in a bad state if there are invalid or truncated UTF-8 characters #23

Javascript string decode leaves parser in a bad state if there are invalid or truncated UTF-8 characters #23

Comments

afarnsworth-valve commented Sep 11, 2020

dibenede commented Sep 2, 2022