This repository contains all the data, prompts, and evaluation scripts for the blog post: JSON vs. XML: A Data-Driven Analysis of LLM Parsing Efficiency.
This study benchmarks 12 leading LLMs from 8 major AI companies to determine whether JSON or XML is a more effective format for structuring data within complex prompts.
- Format preference is architecture-dependent. There is no single "best" format.
- Anthropic (Claude) models show a strong and consistent preference for XML (+30% accuracy boost).
- Kimi and Google (Gemini) models show a clear preference for JSON (+40% and +10% respectively).
- xAI (Grok) is format-agnostic, achieving 100% accuracy with both formats.
- Model capability is the most important factor. Several models failed the complex task regardless of format.
llm-format-benchmark/
├── README.md
├── LICENSE
├── evaluate.py # Automated evaluation script
├── evaluation_results_detailed.csv # Individual run results
├── evaluation_results_summary.csv # Aggregated results table
├── prompts/ # Exact JSON and XML prompts used
└── results/ # All 120 raw LLM outputs
├── claude-4-sonnet/
├── gpt-4o/
├── grok-3/
└── ... (12 models total)
This repository is designed for full transparency and replicability.
-
Clone the repository:
git clone https://github.com/royphilip/llm-format-benchmark.git cd llm-format-benchmark
-
Inspect the data (Optional):
- The exact prompts used are in the
prompts/
directory. - All 120 raw text outputs from the LLMs are in the
results/
directory, organized by model and format.
- The exact prompts used are in the
-
Run the evaluation script:
python evaluate.py
This script will process all files in the
results/
directory and generate two new files:evaluation_results_detailed.csv
: A log of the success/failure of every single run.evaluation_results_summary.csv
: The aggregated data used in the blog post's tables.
Note: Running the script will overwrite the existing
.csv
files. You can usegit diff
to compare your generated output with the original results committed to the repository.
If you use this data or methodology in your research, please cite:
Roy Philip (2025). JSON vs. XML: A Data-Driven Analysis of LLM Parsing Efficiency.
Available at: https://royphilip.xyz/blog/json-vs-xml-llm-showdown
The code and data in this repository are released under the MIT License. You are free to use, modify, and distribute this work, provided you give appropriate credit.