Name		Name	Last commit message	Last commit date
parent directory ..
PolyMerge_LangSec2020.pdf		PolyMerge_LangSec2020.pdf
README.md		README.md

README.md

Toward Automated Grammar Extraction via Semantic Labeling of Parser Implementations

This is the conference talk associated with an academic paper introducing a new approach for labeling the semantic purpose of the functions in a parser. An input file with a known syntax tree is passed to a copy of the target parser that has been instrumented for universal taint tracking. This paper introduces a novel algorithm for merging that syntax tree ground truth with the observed taint and control-flow information from the parser's execution. This produces a mapping from types in the file format to the set of functions most specialized in operating on that type. The resulting mapping has applications in mutational fuzzing, reverse engineering, differential analysis, as well as automated grammar extraction. We demonstrate that even a single execution of an instrumented parser with a single input file can lead to a mapping that a human would identify as intuitively correct. We hope that this approach will lead to both safer subsets of file formats, as well as safer parsers.

Resources

Recording of the talk.
Blog post describing the project and tooling.
https://github.com/trailofbits/polyfile
https://github.com/trailofbits/polytracker

Presented at

LangSec 2020

Authored by

Carson Harmon
Brad Larsen
Evan Sultanik

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semantic Labeling of Parsers

Semantic Labeling of Parsers

README.md

Toward Automated Grammar Extraction via Semantic Labeling of Parser Implementations

Files

Semantic Labeling of Parsers

Directory actions

More options

Directory actions

More options

Latest commit

History

Semantic Labeling of Parsers

Folders and files

parent directory

README.md

Toward Automated Grammar Extraction via Semantic Labeling of Parser Implementations