parser-bench

A dataset and benchmark for file to file parser construction with LLMs that write code.

structure

For each filetype there is one directory with multiple subdirectories:

myfiletype
    meta.json
    implementations
        1
            meta.json
            parser.py
    inputs
        1.your_extension
        2.your_extension
    outputs
        1.json
        2.json

Check data/zeopp-sa for an example

FAQ

What do I get from contributing?

Besides helping to advance science, meaningful contributions (i.e., merged PR adding an entry) will qualify for co-authorship on a paper (that might come out of this work).

What languages/packages/frameworks can I use for the example implementation?

Please focus on implementation examples in

Python (preferred)
JavaScript
TypeScript

as our current infrastructure can only test code in these languages.

In the example implementations, please only use the standard libraries and the following additional dependencies:

Python:

JavaScript/TypeScript:

How do I structure the code example?

Please provide the implementation as function that accepts the file as string and returns the parsed json string.

Validating the data

Install the package

pip install -e .

Then run the validation

parserbench.validate_dirs data/

Related projects

chemical-files-registry: started as registry for filetypes that are commonly used in chemistry
metadata_extractors_registry: started as part of the MARDA extractors working group

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
data		data
src/parser_bench		src/parser_bench
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

parser-bench

structure

FAQ

What do I get from contributing?

What languages/packages/frameworks can I use for the example implementation?

How do I structure the code example?

Validating the data

Related projects

About

Releases

Packages

Contributors 2

Languages

License

ur-whitelab/parser-bench

Folders and files

Latest commit

History

Repository files navigation

parser-bench

structure

FAQ

What do I get from contributing?

What languages/packages/frameworks can I use for the example implementation?

How do I structure the code example?

Validating the data

Related projects

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages