Skip to content

Latest commit

 

History

History
86 lines (58 loc) · 2.38 KB

README.md

File metadata and controls

86 lines (58 loc) · 2.38 KB

parser-bench

A dataset and benchmark for file to file parser construction with LLMs that write code.

structure

For each filetype there is one directory with multiple subdirectories:

myfiletype
    meta.json
    implementations
        1
            meta.json
            parser.py
    inputs
        1.your_extension
        2.your_extension
    outputs
        1.json
        2.json

Check data/zeopp-sa for an example

FAQ

What do I get from contributing?

Besides helping to advance science, meaningful contributions (i.e., merged PR adding an entry) will qualify for co-authorship on a paper (that might come out of this work).

What languages/packages/frameworks can I use for the example implementation?

Please focus on implementation examples in

  • Python (preferred)
  • JavaScript
  • TypeScript

as our current infrastructure can only test code in these languages.

In the example implementations, please only use the standard libraries and the following additional dependencies:

Python:

JavaScript/TypeScript:

How do I structure the code example?

Please provide the implementation as function that accepts the file as string and returns the parsed json string.

Validating the data

Install the package

pip install -e .

Then run the validation

parserbench.validate_dirs data/

Related projects