LLM Format Parsing Benchmark (JSON vs. XML)

This repository contains all the data, prompts, and evaluation scripts for the blog post: JSON vs. XML: A Data-Driven Analysis of LLM Parsing Efficiency.

This study benchmarks 12 leading LLMs from 8 major AI companies to determine whether JSON or XML is a more effective format for structuring data within complex prompts.

Key Findings

Format preference is architecture-dependent. There is no single "best" format.
Anthropic (Claude) models show a strong and consistent preference for XML (+30% accuracy boost).
Kimi and Google (Gemini) models show a clear preference for JSON (+40% and +10% respectively).
xAI (Grok) is format-agnostic, achieving 100% accuracy with both formats.
Model capability is the most important factor. Several models failed the complex task regardless of format.

Repository Structure

llm-format-benchmark/
├── README.md
├── LICENSE
├── evaluate.py                     # Automated evaluation script
├── evaluation_results_detailed.csv # Individual run results
├── evaluation_results_summary.csv  # Aggregated results table
├── prompts/                        # Exact JSON and XML prompts used
└── results/                        # All 120 raw LLM outputs
    ├── claude-4-sonnet/
    ├── gpt-4o/
    ├── grok-3/
    └── ... (12 models total)

How to Replicate the Results

This repository is designed for full transparency and replicability.

Clone the repository:

git clone https://github.com/royphilip/llm-format-benchmark.git
cd llm-format-benchmark

Inspect the data (Optional):
- The exact prompts used are in the prompts/ directory.
- All 120 raw text outputs from the LLMs are in the results/ directory, organized by model and format.
Run the evaluation script:
```
python evaluate.py
```
This script will process all files in the results/ directory and generate two new files:
- evaluation_results_detailed.csv: A log of the success/failure of every single run.
- evaluation_results_summary.csv: The aggregated data used in the blog post's tables.
Note: Running the script will overwrite the existing .csv files. You can use git diff to compare your generated output with the original results committed to the repository.

Citation

If you use this data or methodology in your research, please cite:

Roy Philip (2025). JSON vs. XML: A Data-Driven Analysis of LLM Parsing Efficiency. 
Available at: https://royphilip.xyz/blog/json-vs-xml-llm-showdown

License

The code and data in this repository are released under the MIT License. You are free to use, modify, and distribute this work, provided you give appropriate credit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Format Parsing Benchmark (JSON vs. XML)

Key Findings

Repository Structure

How to Replicate the Results

Citation

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
prompts		prompts
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
evaluation_results_detailed.csv		evaluation_results_detailed.csv
evaluation_results_summary.csv		evaluation_results_summary.csv

License

royphilip/llm-format-benchmark

Folders and files

Latest commit

History

Repository files navigation

LLM Format Parsing Benchmark (JSON vs. XML)

Key Findings

Repository Structure

How to Replicate the Results

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages