Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Javascript string decode leaves parser in a bad state if there are invalid or truncated UTF-8 characters #23

Open
afarnsworth-valve opened this issue Sep 11, 2020 · 1 comment
Assignees
Labels
bug Something isn't working javascript port-fix triaged Issue has been triaged

Comments

@afarnsworth-valve
Copy link

What version of protobuf and what language are you using?
Version: v3.7.1.
Language: Javascript

What operating system (Linux, Windows, ...) and version?
Chrome

What runtime / compiler are you using (e.g., python version or gcc version)
Webpack/Typescript/Chrome

What did you do?
When a protobuf with a string field is parsed and that string field ends with an incomplete UTF8 character, readString in the binary decoder will advance the read cursor past the end of the string field, causing the rest of the message to fail to parse.

What did you expect to see
Possibly an assert in the string reader, but allow rest of message to parse correctly despite invalid content in the string field.

What did you see instead?
Usually an assert is thrown, but the error is bogus as the binary reader has advanced into field data and is trying to interpret it as field number/metadata. The error does not identify the string field that actually caused the problem.

The function is located here, in jspb.BinaryDecoder.prototype.readString:
https://github.com/protocolbuffers/protobuf/blob/master/js/binary/decoder.js#L830

The problem arises due to a combination of advancing the cursor without bounds checking:

    } else if (c < 240) { // UTF-8 with three bytes.
      var c2 = bytes[cursor++];
      var c3 = bytes[cursor++];

which can allow for cursor to advance past end. At the exit of the function, the reader's internal cursor is set:

  this.cursor_ = cursor;

At this point, cursor is a byte or two into the next field, and parsing fails.

Anything else we should know about your project / environment
The other end of this communication channel is using the C++ protobuf library, which does optionally warn but ultimately allows for potentially invalid utf-8 data to be serialized.

@acozzette acozzette transferred this issue from protocolbuffers/protobuf May 16, 2022
@dibenede dibenede added bug Something isn't working triaged Issue has been triaged labels Sep 2, 2022
@dibenede
Copy link
Contributor

dibenede commented Sep 2, 2022

This is a known issue and we need to upstream the fix.

@lukesandberg lukesandberg self-assigned this Oct 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working javascript port-fix triaged Issue has been triaged
Projects
None yet
Development

No branches or pull requests

5 participants