Skip to content

Replace RelevanceTruthAndCompletenessEvaluator #46075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 22 additions & 22 deletions docs/ai/tutorials/evaluate-with-reporting.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
---
title: Tutorial - Evaluate a model's response
description: Create an MSTest app and add a custom evaluator to evaluate the AI chat response of a language model, and learn how to use the caching and reporting features of Microsoft.Extensions.AI.Evaluation.
ms.date: 03/14/2025
ms.date: 05/09/2025
ms.topic: tutorial
ms.custom: devx-track-dotnet-ai
---

# Tutorial: Evaluate a model's response with response caching and reporting

In this tutorial, you create an MSTest app to evaluate the chat response of an OpenAI model. The test app uses the [Microsoft.Extensions.AI.Evaluation](https://www.nuget.org/packages/Microsoft.Extensions.AI.Evaluation) libraries to perform the evaluations, cache the model responses, and create reports. The tutorial uses both a [built-in evaluator](xref:Microsoft.Extensions.AI.Evaluation.Quality.RelevanceTruthAndCompletenessEvaluator) and a custom evaluator.
In this tutorial, you create an MSTest app to evaluate the chat response of an OpenAI model. The test app uses the [Microsoft.Extensions.AI.Evaluation](https://www.nuget.org/packages/Microsoft.Extensions.AI.Evaluation) libraries to perform the evaluations, cache the model responses, and create reports. The tutorial uses both built-in and custom evaluators.

## Prerequisites

Expand All @@ -25,32 +25,32 @@ Complete the following steps to create an MSTest project that connects to the `g

1. In a terminal window, navigate to the directory where you want to create your app, and create a new MSTest app with the `dotnet new` command:

```dotnetcli
dotnet new mstest -o TestAIWithReporting
```
```dotnetcli
dotnet new mstest -o TestAIWithReporting
```

1. Navigate to the `TestAIWithReporting` directory, and add the necessary packages to your app:

```dotnetcli
dotnet add package Azure.AI.OpenAI
dotnet add package Azure.Identity
dotnet add package Microsoft.Extensions.AI.Abstractions --prerelease
dotnet add package Microsoft.Extensions.AI.Evaluation --prerelease
dotnet add package Microsoft.Extensions.AI.Evaluation.Quality --prerelease
dotnet add package Microsoft.Extensions.AI.Evaluation.Reporting --prerelease
dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease
dotnet add package Microsoft.Extensions.Configuration
dotnet add package Microsoft.Extensions.Configuration.UserSecrets
```
```dotnetcli
dotnet add package Azure.AI.OpenAI
dotnet add package Azure.Identity
dotnet add package Microsoft.Extensions.AI.Abstractions --prerelease
dotnet add package Microsoft.Extensions.AI.Evaluation --prerelease
dotnet add package Microsoft.Extensions.AI.Evaluation.Quality --prerelease
dotnet add package Microsoft.Extensions.AI.Evaluation.Reporting --prerelease
dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease
dotnet add package Microsoft.Extensions.Configuration
dotnet add package Microsoft.Extensions.Configuration.UserSecrets
```

1. Run the following commands to add [app secrets](/aspnet/core/security/app-secrets) for your Azure OpenAI endpoint, model name, and tenant ID:

```bash
dotnet user-secrets init
dotnet user-secrets set AZURE_OPENAI_ENDPOINT <your-azure-openai-endpoint>
dotnet user-secrets set AZURE_OPENAI_GPT_NAME gpt-4o
dotnet user-secrets set AZURE_TENANT_ID <your-tenant-id>
```
```bash
dotnet user-secrets init
dotnet user-secrets set AZURE_OPENAI_ENDPOINT <your-azure-openai-endpoint>
dotnet user-secrets set AZURE_OPENAI_GPT_NAME gpt-4o
dotnet user-secrets set AZURE_TENANT_ID <your-tenant-id>
```

(Depending on your environment, the tenant ID might not be needed. In that case, remove it from the code that instantiates the <xref:Azure.Identity.DefaultAzureCredential>.)

Expand Down
24 changes: 10 additions & 14 deletions docs/ai/tutorials/snippets/evaluate-with-reporting/MyTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,11 @@ private static ChatConfiguration GetAzureOpenAIChatConfiguration()
// <SnippetGetEvaluators>
private static IEnumerable<IEvaluator> GetEvaluators()
{
IEvaluator rtcEvaluator = new RelevanceTruthAndCompletenessEvaluator();
IEvaluator relevanceEvaluator = new RelevanceEvaluator();
IEvaluator coherenceEvaluator = new CoherenceEvaluator();
IEvaluator wordCountEvaluator = new WordCountEvaluator();

return [rtcEvaluator, wordCountEvaluator];
return [relevanceEvaluator, coherenceEvaluator, wordCountEvaluator];
}
// </SnippetGetEvaluators>

Expand Down Expand Up @@ -104,20 +105,15 @@ private static void Validate(EvaluationResult result)
{
// Retrieve the score for relevance from the <see cref="EvaluationResult"/>.
NumericMetric relevance =
result.Get<NumericMetric>(RelevanceTruthAndCompletenessEvaluator.RelevanceMetricName);
result.Get<NumericMetric>(RelevanceEvaluator.RelevanceMetricName);
Assert.IsFalse(relevance.Interpretation!.Failed, relevance.Reason);
Assert.IsTrue(relevance.Interpretation.Rating is EvaluationRating.Good or EvaluationRating.Exceptional);

// Retrieve the score for truth from the <see cref="EvaluationResult"/>.
NumericMetric truth = result.Get<NumericMetric>(RelevanceTruthAndCompletenessEvaluator.TruthMetricName);
Assert.IsFalse(truth.Interpretation!.Failed, truth.Reason);
Assert.IsTrue(truth.Interpretation.Rating is EvaluationRating.Good or EvaluationRating.Exceptional);

// Retrieve the score for completeness from the <see cref="EvaluationResult"/>.
NumericMetric completeness =
result.Get<NumericMetric>(RelevanceTruthAndCompletenessEvaluator.CompletenessMetricName);
Assert.IsFalse(completeness.Interpretation!.Failed, completeness.Reason);
Assert.IsTrue(completeness.Interpretation.Rating is EvaluationRating.Good or EvaluationRating.Exceptional);
// Retrieve the score for coherence from the <see cref="EvaluationResult"/>.
NumericMetric coherence =
result.Get<NumericMetric>(CoherenceEvaluator.CoherenceMetricName);
Assert.IsFalse(coherence.Interpretation!.Failed, coherence.Reason);
Assert.IsTrue(coherence.Interpretation.Rating is EvaluationRating.Good or EvaluationRating.Exceptional);

// Retrieve the word count from the <see cref="EvaluationResult"/>.
NumericMetric wordCount = result.Get<NumericMetric>(WordCountEvaluator.WordCountMetricName);
Expand All @@ -135,7 +131,7 @@ public async Task SampleAndEvaluateResponse()
// Create a <see cref="ScenarioRun"/> with the scenario name
// set to the fully qualified name of the current test method.
await using ScenarioRun scenarioRun =
await s_defaultReportingConfiguration.CreateScenarioRunAsync(this.ScenarioName);
await s_defaultReportingConfiguration.CreateScenarioRunAsync(ScenarioName);

// Use the <see cref="IChatClient"/> that's included in the
// <see cref="ScenarioRun.ChatConfiguration"/> to get the LLM response.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@
<ItemGroup>
<PackageReference Include="Azure.AI.OpenAI" Version="2.1.0" />
<PackageReference Include="Azure.Identity" Version="1.13.2" />
<PackageReference Include="Microsoft.Extensions.AI.Abstractions" Version="9.4.0-preview.1.25207.5" />
<PackageReference Include="Microsoft.Extensions.AI.Evaluation" Version="9.4.0-preview.1.25207.5" />
<PackageReference Include="Microsoft.Extensions.AI.Evaluation.Quality" Version="9.4.0-preview.1.25207.5" />
<PackageReference Include="Microsoft.Extensions.AI.Evaluation.Reporting" Version="9.4.0-preview.1.25207.5" />
<PackageReference Include="Microsoft.Extensions.AI.OpenAI" Version="9.4.0-preview.1.25207.5" />
<PackageReference Include="Microsoft.Extensions.AI.Abstractions" Version="9.4.3-preview.1.25230.7" />
<PackageReference Include="Microsoft.Extensions.AI.Evaluation" Version="9.4.3-preview.1.25230.7" />
<PackageReference Include="Microsoft.Extensions.AI.Evaluation.Quality" Version="9.4.3-preview.1.25230.7" />
<PackageReference Include="Microsoft.Extensions.AI.Evaluation.Reporting" Version="9.4.3-preview.1.25230.7" />
<PackageReference Include="Microsoft.Extensions.AI.OpenAI" Version="9.4.3-preview.1.25230.7" />
<PackageReference Include="microsoft.extensions.configuration" Version="9.0.4" />
<PackageReference Include="Microsoft.Extensions.Configuration.UserSecrets" Version="9.0.4" />
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.13.0" />
Expand Down