Add comprehensive documentation and examples for `cagent eval` command #466

Copilot · 2025-10-09T17:27:02Z

Fixes the missing documentation for the cagent eval command by adding comprehensive usage guides and working examples.

What was missing

The cagent eval command had minimal documentation:

Only a brief one-line mention in docs/USAGE.md: $ cagent eval config.yaml # Run evaluations
Basic example in examples/eval/README.md with incorrect command syntax
No explanation of evaluation metrics, workflow, or use cases

What this PR adds

Enhanced Documentation in `docs/USAGE.md`

Complete "Agent Evaluation" section with detailed explanations of the evaluation system
Correct command syntax: cagent eval <agent-config> <eval-dir>
Two methods for creating evaluation data:
- Interactive session saving using /eval command
- Manual creation of session JSON files
Detailed explanation of evaluation metrics:
- Tool Trajectory Score (0.0-1.0): Measures consistency of tool usage between runs
- ROUGE-1 Score (0.0-1.0): Measures text similarity via word overlap
Complete workflow guide from development to automated evaluation
Use cases and best practices for regression testing, A/B testing, and performance monitoring

Enhanced Examples in `examples/eval/`

Comprehensive README with step-by-step instructions and expected outputs
Working agent configuration (agent.yaml) for demonstrations
Sample evaluation sessions in proper JSON format:
- sample-calculation.json: Math problem evaluation session
- sample-question.json: General knowledge evaluation session
Prerequisites documentation (API key requirements)

Example usage after this PR

# Create evaluation data during development
$ cagent run config.yaml
User: What is 15 + 23?
Agent: I'll solve this step by step: 15 + 23 = 38
/eval    # Saves session to ./evals/

# Run evaluations to test consistency
$ cagent eval config.yaml ./evals
Eval file: sample-calculation-eval
Tool trajectory score: 1.000000
Rouge-1 score: 0.785430

The documentation now provides users with everything needed to understand, implement, and effectively use agent evaluation for quality assurance and regression testing.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

registry.example.com
- Triggering command: /tmp/go-build3875693253/b814/remote.test -test.testlogfile=/tmp/go-build3875693253/b814/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Original prompt

This section details on the original issue you should resolve

<issue_title>The example on how to use cagent eval is missing</issue_title>
<issue_description></issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #413

💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Co-authored-by: dgageot <[email protected]>

dgageot · 2025-10-09T18:32:51Z

@rumpl does it make sense?

dgageot · 2025-10-16T07:18:11Z

Let's close it for now

Initial plan

66fc76f

Copilot AI assigned Copilot and dgageot Oct 9, 2025

Copilot started work on behalf of dgageot October 9, 2025 17:28 View session

Copilot AI and others added 2 commits October 9, 2025 17:36

Add comprehensive cagent eval documentation and examples

eba9f09

Co-authored-by: dgageot <[email protected]>

Add sample agent configuration and evaluation data files

7cebfa5

Co-authored-by: dgageot <[email protected]>

Copilot AI changed the title ~~[WIP] Add example usage for cagent eval~~ Add comprehensive documentation and examples for cagent eval command Oct 9, 2025

Copilot AI requested a review from dgageot October 9, 2025 17:41

Copilot finished work on behalf of dgageot October 9, 2025 17:41

dgageot marked this pull request as ready for review October 9, 2025 20:05

dgageot requested a review from a team as a code owner October 9, 2025 20:05

dgageot closed this Oct 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add comprehensive documentation and examples for `cagent eval` command #466

Add comprehensive documentation and examples for `cagent eval` command #466

Copilot AI commented Oct 9, 2025 •

edited

Loading

Uh oh!

dgageot commented Oct 9, 2025

Uh oh!

dgageot commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add comprehensive documentation and examples for cagent eval command #466

Add comprehensive documentation and examples for cagent eval command #466

Conversation

Copilot AI commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What was missing

What this PR adds

Enhanced Documentation in docs/USAGE.md

Enhanced Examples in examples/eval/

Example usage after this PR

I tried to connect to the following addresses, but was blocked by firewall rules:

Comments on the Issue (you are @copilot in this section)

Uh oh!

dgageot commented Oct 9, 2025

Uh oh!

dgageot commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add comprehensive documentation and examples for `cagent eval` command #466

Add comprehensive documentation and examples for `cagent eval` command #466

Copilot AI commented Oct 9, 2025 •

edited

Loading

Enhanced Documentation in `docs/USAGE.md`

Enhanced Examples in `examples/eval/`