Skip to content

royphilip/llm-format-benchmark

Repository files navigation

LLM Format Parsing Benchmark (JSON vs. XML)

This repository contains all the data, prompts, and evaluation scripts for the blog post: JSON vs. XML: A Data-Driven Analysis of LLM Parsing Efficiency.

This study benchmarks 12 leading LLMs from 8 major AI companies to determine whether JSON or XML is a more effective format for structuring data within complex prompts.


Key Findings

  • Format preference is architecture-dependent. There is no single "best" format.
  • Anthropic (Claude) models show a strong and consistent preference for XML (+30% accuracy boost).
  • Kimi and Google (Gemini) models show a clear preference for JSON (+40% and +10% respectively).
  • xAI (Grok) is format-agnostic, achieving 100% accuracy with both formats.
  • Model capability is the most important factor. Several models failed the complex task regardless of format.

Repository Structure

llm-format-benchmark/
├── README.md
├── LICENSE
├── evaluate.py                     # Automated evaluation script
├── evaluation_results_detailed.csv # Individual run results
├── evaluation_results_summary.csv  # Aggregated results table
├── prompts/                        # Exact JSON and XML prompts used
└── results/                        # All 120 raw LLM outputs
    ├── claude-4-sonnet/
    ├── gpt-4o/
    ├── grok-3/
    └── ... (12 models total)

How to Replicate the Results

This repository is designed for full transparency and replicability.

  1. Clone the repository:

    git clone https://github.com/royphilip/llm-format-benchmark.git
    cd llm-format-benchmark
  2. Inspect the data (Optional):

    • The exact prompts used are in the prompts/ directory.
    • All 120 raw text outputs from the LLMs are in the results/ directory, organized by model and format.
  3. Run the evaluation script:

    python evaluate.py

    This script will process all files in the results/ directory and generate two new files:

    • evaluation_results_detailed.csv: A log of the success/failure of every single run.
    • evaluation_results_summary.csv: The aggregated data used in the blog post's tables.

    Note: Running the script will overwrite the existing .csv files. You can use git diff to compare your generated output with the original results committed to the repository.


Citation

If you use this data or methodology in your research, please cite:

Roy Philip (2025). JSON vs. XML: A Data-Driven Analysis of LLM Parsing Efficiency. 
Available at: https://royphilip.xyz/blog/json-vs-xml-llm-showdown

License

The code and data in this repository are released under the MIT License. You are free to use, modify, and distribute this work, provided you give appropriate credit.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages