Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive depth on google.protobuf.Structs #143

Open
asraa opened this issue Nov 12, 2019 · 2 comments
Open

Recursive depth on google.protobuf.Structs #143

asraa opened this issue Nov 12, 2019 · 2 comments

Comments

@asraa
Copy link

asraa commented Nov 12, 2019

Hi,

It seems like one of our fuzzers is failing because of a stack-overflow in protobuf's TextFormat::Parser. The input that libprotobuf-mutator created was a Struct that has a large recursive depth, but is not enough to trigger the max recursion limit in the parser (which is by default 2^15 - 1).
https://github.com/protocolbuffers/protobuf/blob/6fc04c3f0e07857dff1f55b884f957b4c65aea8e/src/google/protobuf/text_format.cc#L1383.

Is there some environment variable we could set to limit the recursive depth that LPM creates, or another way to handle such an input?

Thanks!

@vitalybuka
Copy link
Collaborator

I assumed I solved the issue few month ago

parser.SetRecursionLimit(100);

Is possible that just 100 breaks your binary?

@dgg5503
Copy link

dgg5503 commented May 22, 2024

Hi @vitalybuka,

Apologies for bumping a 5-year-old thread, however, I believe I've encountered an issue like this one. When using LPM integrated with libFuzzer, it is possible for LPM to generate a deeply nested message beyond the limits of the text or binary parsers. This is problematic in the following situation:

  • A .proto defines a data structure capable of recursively nested messages. For example, this stripped down .proto for a subset of JSON:
    syntax = "proto2";
    package data_structure;
    
    message Input {
      required Element element = 1;
    }
    
    message Value {
      oneof value {
        Array array = 1;
        /* other value types supported by JSON would live here */
      }
    }
    
    message Element {
      required Value value = 1;
    }
    
    message Array {
      repeated Element elements = 1;
    }
    
  • A fuzzer is created using the above structure with LPM.
  • After enough iterations, LPM begins to generate deeply nested messages (beyond 100+ levels deep).
  • Suppose one of these deeply nested messages signals new coverage to the fuzzing engine.
  • The fuzzer serializes then saves out the deeply nested message to the corpus.
  • The user restarts the fuzzer.
  • Corpus inputs are executed; however, the following error appears since one of those messages are beyond the parser's limit:
    Error parsing text-format data_structure.Input: 131:207: Message is too deep, the parser exceeded the configured recursion limit of 100.
    
  • Coverage is no longer the same after reading all inputs from the corpus since the above input failed to parse and therefore did not execute as per:
    #define DEFINE_TEST_ONE_PROTO_INPUT_IMPL(use_binary, Proto) \
    extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { \
    using protobuf_mutator::libfuzzer::LoadProtoInput; \
    Proto input; \
    if (LoadProtoInput(use_binary, data, size, &input)) \
    TestOneProtoInput(input); \

In this situation, one may increase the recursion limit via SetRecursionLimit, however, even deeper messages are still capable of being generated and saved to disk, especially via crossover. Additionally, the fuzzer and possibly the function under test may end up spending more time deserializing, processing, & mutating deeply nested messages which could be better spent mutating fields at shallower levels.

With that said, is there any way to enforce a depth limit at the mutation level that I may have overlooked? In other words, during LPM mutation, if it detects that an add/clone/copy will put the mutated message over some user-provided maximum depth, can that be prevented in favor of shallower field mutation? I believe this would:

  • Prevent any chance of creating messages that are too deep to deserialize thus preserving coverage between fuzzer restarts
  • Prevent the need to periodically increase the parser's recursion limit
  • Possibly improve overall exec/s by limiting recursion depth during deserialization, fuzz target processing, and mutation

If there isn't already a way to enforce this, I'm curious what you'd think would be required to implement such a feature. I would be interested in providing a PR to add such functionality.

I've attached a simple example which demonstrates the scenario above.
issue-143-supplement.tar.gz

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants