Recursive depth on google.protobuf.Structs #143

asraa · 2019-11-12T20:14:27Z

Hi,

It seems like one of our fuzzers is failing because of a stack-overflow in protobuf's TextFormat::Parser. The input that libprotobuf-mutator created was a Struct that has a large recursive depth, but is not enough to trigger the max recursion limit in the parser (which is by default 2^15 - 1).
https://github.com/protocolbuffers/protobuf/blob/6fc04c3f0e07857dff1f55b884f957b4c65aea8e/src/google/protobuf/text_format.cc#L1383.

Is there some environment variable we could set to limit the recursive depth that LPM creates, or another way to handle such an input?

Thanks!

The text was updated successfully, but these errors were encountered:

vitalybuka · 2019-12-18T19:35:56Z

I assumed I solved the issue few month ago

libprotobuf-mutator/src/text_format.cc

Line 31 in dd89da9

parser.SetRecursionLimit(100);

Is possible that just 100 breaks your binary?

dgg5503 · 2024-05-22T17:18:32Z

Hi @vitalybuka,

Apologies for bumping a 5-year-old thread, however, I believe I've encountered an issue like this one. When using LPM integrated with libFuzzer, it is possible for LPM to generate a deeply nested message beyond the limits of the text or binary parsers. This is problematic in the following situation:

A .proto defines a data structure capable of recursively nested messages. For example, this stripped down .proto for a subset of JSON:

syntax = "proto2";
package data_structure;

message Input {
  required Element element = 1;
}

message Value {
  oneof value {
    Array array = 1;
    /* other value types supported by JSON would live here */
  }
}

message Element {
  required Value value = 1;
}

message Array {
  repeated Element elements = 1;
}

A fuzzer is created using the above structure with LPM.
After enough iterations, LPM begins to generate deeply nested messages (beyond 100+ levels deep).
Suppose one of these deeply nested messages signals new coverage to the fuzzing engine.
The fuzzer serializes then saves out the deeply nested message to the corpus.
The user restarts the fuzzer.

Corpus inputs are executed; however, the following error appears since one of those messages are beyond the parser's limit:

Error parsing text-format data_structure.Input: 131:207: Message is too deep, the parser exceeded the configured recursion limit of 100.

Coverage is no longer the same after reading all inputs from the corpus since the above input failed to parse and therefore did not execute as per:

libprotobuf-mutator/src/libfuzzer/libfuzzer_macro.h

Lines 72 to 77 in 1f95f80

    
           #define DEFINE_TEST_ONE_PROTO_INPUT_IMPL(use_binary, Proto)                 \ 
        
             extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { \ 
        
               using protobuf_mutator::libfuzzer::LoadProtoInput;                      \ 
        
               Proto input;                                                            \ 
        
               if (LoadProtoInput(use_binary, data, size, &input))                     \ 
        
                 TestOneProtoInput(input);                                             \

In this situation, one may increase the recursion limit via SetRecursionLimit, however, even deeper messages are still capable of being generated and saved to disk, especially via crossover. Additionally, the fuzzer and possibly the function under test may end up spending more time deserializing, processing, & mutating deeply nested messages which could be better spent mutating fields at shallower levels.

With that said, is there any way to enforce a depth limit at the mutation level that I may have overlooked? In other words, during LPM mutation, if it detects that an add/clone/copy will put the mutated message over some user-provided maximum depth, can that be prevented in favor of shallower field mutation? I believe this would:

Prevent any chance of creating messages that are too deep to deserialize thus preserving coverage between fuzzer restarts
Prevent the need to periodically increase the parser's recursion limit
Possibly improve overall exec/s by limiting recursion depth during deserialization, fuzz target processing, and mutation

If there isn't already a way to enforce this, I'm curious what you'd think would be required to implement such a feature. I would be interested in providing a PR to add such functionality.

I've attached a simple example which demonstrates the scenario above.
issue-143-supplement.tar.gz

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recursive depth on google.protobuf.Structs #143

Recursive depth on google.protobuf.Structs #143

asraa commented Nov 12, 2019

vitalybuka commented Dec 18, 2019

dgg5503 commented May 22, 2024

Recursive depth on google.protobuf.Structs #143

Recursive depth on google.protobuf.Structs #143

Comments

asraa commented Nov 12, 2019

vitalybuka commented Dec 18, 2019

dgg5503 commented May 22, 2024