Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Oct 9, 2025

Fixes the missing documentation for the cagent eval command by adding comprehensive usage guides and working examples.

What was missing

The cagent eval command had minimal documentation:

  • Only a brief one-line mention in docs/USAGE.md: $ cagent eval config.yaml # Run evaluations
  • Basic example in examples/eval/README.md with incorrect command syntax
  • No explanation of evaluation metrics, workflow, or use cases

What this PR adds

Enhanced Documentation in docs/USAGE.md

  • Complete "Agent Evaluation" section with detailed explanations of the evaluation system
  • Correct command syntax: cagent eval <agent-config> <eval-dir>
  • Two methods for creating evaluation data:
    • Interactive session saving using /eval command
    • Manual creation of session JSON files
  • Detailed explanation of evaluation metrics:
    • Tool Trajectory Score (0.0-1.0): Measures consistency of tool usage between runs
    • ROUGE-1 Score (0.0-1.0): Measures text similarity via word overlap
  • Complete workflow guide from development to automated evaluation
  • Use cases and best practices for regression testing, A/B testing, and performance monitoring

Enhanced Examples in examples/eval/

  • Comprehensive README with step-by-step instructions and expected outputs
  • Working agent configuration (agent.yaml) for demonstrations
  • Sample evaluation sessions in proper JSON format:
    • sample-calculation.json: Math problem evaluation session
    • sample-question.json: General knowledge evaluation session
  • Prerequisites documentation (API key requirements)

Example usage after this PR

# Create evaluation data during development
$ cagent run config.yaml
User: What is 15 + 23?
Agent: I'll solve this step by step: 15 + 23 = 38
/eval    # Saves session to ./evals/

# Run evaluations to test consistency
$ cagent eval config.yaml ./evals
Eval file: sample-calculation-eval
Tool trajectory score: 1.000000
Rouge-1 score: 0.785430

The documentation now provides users with everything needed to understand, implement, and effectively use agent evaluation for quality assurance and regression testing.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • registry.example.com
    • Triggering command: /tmp/go-build3875693253/b814/remote.test -test.testlogfile=/tmp/go-build3875693253/b814/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>The example on how to use cagent eval is missing</issue_title>
<issue_description></issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #413


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

@Copilot Copilot AI changed the title [WIP] Add example usage for cagent eval Add comprehensive documentation and examples for cagent eval command Oct 9, 2025
@Copilot Copilot AI requested a review from dgageot October 9, 2025 17:41
Copilot finished work on behalf of dgageot October 9, 2025 17:41
@dgageot
Copy link
Member

dgageot commented Oct 9, 2025

@rumpl does it make sense?

@dgageot dgageot marked this pull request as ready for review October 9, 2025 20:05
@dgageot dgageot requested a review from a team as a code owner October 9, 2025 20:05
@dgageot
Copy link
Member

dgageot commented Oct 16, 2025

Let's close it for now

@dgageot dgageot closed this Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The example on how to use cagent eval is missing

2 participants