You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fuzz testers, such as the American Fuzzy Lop (AFL), automatically generate test inputs that cause a program to behave differently or crash. In AFL, the program under test is instrumented to record control flow, which helps identify inputs that produce different behavior. A genetic algorithm then iteratively mutates seed inputs to explore further variations, aiming to produce novel behaviors. While this algorithm can operate solely based on program outputs, such an approach may yield weaker results.
While fuzz testing is often used to identify security vulnerabilities, the generated inputs can also be valuable for regression testing. Additionally, these inputs can be compared against a reference implementation to discover inconsistencies either during the genetic search or afterward.
Rationale
Our CommonMark parser already showed numerous edge cases that our current unit and regression tests do not cover. Currently, our approach is reactive: when an unexpected input is found, we file an issue, add a regression test, and fix the code until the test passes (see, for example, issues #447, #483, #495, #502, and #508). Fuzz testing would allow for a more proactive bug detection approach, enabling us to identify and address issues before they affect users.
Proposal
Implement fuzz tester for the Markdown package. This tool would utilize existing unit and regression tests as seed inputs and iteratively modify them until a set number of iterations is reached, or until it finds an input that causes a crash or produces results inconsistent with a reference implementation. The continuous integration (CI) workflow should be updated to include regular fuzz testing as part of the automated testing process.
Notes
The research aspects of this project involve:
Identifying or designing a genetic algorithm that can discover new test inputs based solely on seed inputs and program outputs, effectively treating the Markdown package as a black box.
Developing a method to reliably compare the Markdown package's unique output format against a reference implementation of CommonMark.
The development tasks for this project include:
Implementing the genetic algorithm in Lua.
Integrating the fuzz tester into the Markdown package’s CI workflow to ensure it runs regularly as part of the automated testing process.
While tools like AFL-Lua allow existing fuzz testers such as AFL to work with Lua directly, they seem too general for our needs. AFL-Lua fuzzes by repeatedly initializing the program, which would be inefficient given that the Markdown package takes approximately 300 ms to initialize. A specialized solution would better serve our needs: one that processes multiple inputs with a single initialized parser to minimize overhead. Additionally, since the Markdown package produces numerous auxiliary outputs, testing should utilize a RAM-disk to avoid disk I/O bottlenecks. Ideally, the fuzz tester should process hundreds of test cases per second.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Background
Fuzz testers, such as the American Fuzzy Lop (AFL), automatically generate test inputs that cause a program to behave differently or crash. In AFL, the program under test is instrumented to record control flow, which helps identify inputs that produce different behavior. A genetic algorithm then iteratively mutates seed inputs to explore further variations, aiming to produce novel behaviors. While this algorithm can operate solely based on program outputs, such an approach may yield weaker results.
While fuzz testing is often used to identify security vulnerabilities, the generated inputs can also be valuable for regression testing. Additionally, these inputs can be compared against a reference implementation to discover inconsistencies either during the genetic search or afterward.
Rationale
Our CommonMark parser already showed numerous edge cases that our current unit and regression tests do not cover. Currently, our approach is reactive: when an unexpected input is found, we file an issue, add a regression test, and fix the code until the test passes (see, for example, issues #447, #483, #495, #502, and #508). Fuzz testing would allow for a more proactive bug detection approach, enabling us to identify and address issues before they affect users.
Proposal
Implement fuzz tester for the Markdown package. This tool would utilize existing unit and regression tests as seed inputs and iteratively modify them until a set number of iterations is reached, or until it finds an input that causes a crash or produces results inconsistent with a reference implementation. The continuous integration (CI) workflow should be updated to include regular fuzz testing as part of the automated testing process.
Notes
The research aspects of this project involve:
The development tasks for this project include:
While tools like AFL-Lua allow existing fuzz testers such as AFL to work with Lua directly, they seem too general for our needs. AFL-Lua fuzzes by repeatedly initializing the program, which would be inefficient given that the Markdown package takes approximately 300 ms to initialize. A specialized solution would better serve our needs: one that processes multiple inputs with a single initialized parser to minimize overhead. Additionally, since the Markdown package produces numerous auxiliary outputs, testing should utilize a RAM-disk to avoid disk I/O bottlenecks. Ideally, the fuzz tester should process hundreds of test cases per second.
Beta Was this translation helpful? Give feedback.
All reactions