Skip to content

Latest commit

 

History

History
 
 

Semantic Labeling of Parsers

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Toward Automated Grammar Extraction via Semantic Labeling of Parser Implementations

This is the conference talk associated with an academic paper introducing a new approach for labeling the semantic purpose of the functions in a parser. An input file with a known syntax tree is passed to a copy of the target parser that has been instrumented for universal taint tracking. This paper introduces a novel algorithm for merging that syntax tree ground truth with the observed taint and control-flow information from the parser's execution. This produces a mapping from types in the file format to the set of functions most specialized in operating on that type. The resulting mapping has applications in mutational fuzzing, reverse engineering, differential analysis, as well as automated grammar extraction. We demonstrate that even a single execution of an instrumented parser with a single input file can lead to a mapping that a human would identify as intuitively correct. We hope that this approach will lead to both safer subsets of file formats, as well as safer parsers.

Resources

Presented at

Authored by

  • Carson Harmon
  • Brad Larsen
  • Evan Sultanik