Automated fuzz-testing [MEP] #519

Witiko · 2024-10-17T08:37:48Z

Witiko
Oct 17, 2024
Maintainer

Background

Fuzz testers, such as the American Fuzzy Lop (AFL), automatically generate test inputs that cause a program to behave differently or crash. In AFL, the program under test is instrumented to record control flow, which helps identify inputs that produce different behavior. A genetic algorithm then iteratively mutates seed inputs to explore further variations, aiming to produce novel behaviors. While this algorithm can operate solely based on program outputs, such an approach may yield weaker results.

While fuzz testing is often used to identify security vulnerabilities, the generated inputs can also be valuable for regression testing. Additionally, these inputs can be compared against a reference implementation to discover inconsistencies either during the genetic search or afterward.

Rationale

Our CommonMark parser already showed numerous edge cases that our current unit and regression tests do not cover. Currently, our approach is reactive: when an unexpected input is found, we file an issue, add a regression test, and fix the code until the test passes (see, for example, issues #447, #483, #495, #502, and #508). Fuzz testing would allow for a more proactive bug detection approach, enabling us to identify and address issues before they affect users.

Proposal

Implement fuzz tester for the Markdown package. This tool would utilize existing unit and regression tests as seed inputs and iteratively modify them until a set number of iterations is reached, or until it finds an input that causes a crash or produces results inconsistent with a reference implementation. The continuous integration (CI) workflow should be updated to include regular fuzz testing as part of the automated testing process.

Notes

The research aspects of this project involve:

Identifying or designing a genetic algorithm that can discover new test inputs based solely on seed inputs and program outputs, effectively treating the Markdown package as a black box.
Developing a method to reliably compare the Markdown package's unique output format against a reference implementation of CommonMark.

The development tasks for this project include:

Implementing the genetic algorithm in Lua.
Integrating the fuzz tester into the Markdown package’s CI workflow to ensure it runs regularly as part of the automated testing process.

While tools like AFL-Lua allow existing fuzz testers such as AFL to work with Lua directly, they seem too general for our needs. AFL-Lua fuzzes by repeatedly initializing the program, which would be inefficient given that the Markdown package takes approximately 300 ms to initialize. A specialized solution would better serve our needs: one that processes multiple inputs with a single initialized parser to minimize overhead. Additionally, since the Markdown package produces numerous auxiliary outputs, testing should utilize a RAM-disk to avoid disk I/O bottlenecks. Ideally, the fuzz tester should process hundreds of test cases per second.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated fuzz-testing [MEP] #519

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Automated fuzz-testing [MEP] #519

Witiko Oct 17, 2024 Maintainer

Background

Rationale

Proposal

Notes

Replies: 0 comments

Witiko
Oct 17, 2024
Maintainer