-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Genai user feedback evaluation #1322
base: main
Are you sure you want to change the base?
Changes from all commits
5e777af
b937f02
8030345
fdc5e6a
a17db18
a538723
2da159c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
|
||
<!--- Hugo front matter used to generate the website version of this page: | ||
linkTitle: Generative AI evaluation events | ||
---> | ||
|
||
# Semantic Conventions for GenAI evaluation events | ||
|
||
**Status**: [Experimental][DocumentStatus] | ||
|
||
Each evaluation event defines a common way to report an evaluation score and the context for this specific evaluation method. | ||
|
||
## Naming pattern | ||
|
||
The evaluation events follow `gen_ai.evaluation.{evaluation method}` naming pattern. | ||
For example, evaluations that are common across different GenAI models and framework tooling, such as user feedback should be reported as `gen_ai.evaluation.user_feedback`. | ||
|
||
GenAI vendor-specific evaluation events SHOULD follow `gen_ai.{gen_ai.system}.evaluation.{evaluation method}` pattern. | ||
|
||
## User feedback evaluation | ||
|
||
The user feedback evaluation event SHOULD be captured if and only if user provided a reaction to GenAI model response. | ||
It SHOULD, when possible, be parented to the GenAI span describing such response. | ||
|
||
<!-- semconv gen_ai.evaluation.user_feedback --> | ||
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. --> | ||
<!-- see templates/registry/markdown/snippet.md.j2 --> | ||
<!-- prettier-ignore-start --> | ||
<!-- markdownlint-capture --> | ||
<!-- markdownlint-disable --> | ||
|
||
The event name MUST be `gen_ai.evaluation.user_feedback`. | ||
|
||
| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this probably should be in the common section and we should talk about user_feedback as an example. |
||
|---|---|---|---|---|---| | ||
| [`gen_ai.response.id`](/docs/attributes-registry/gen-ai.md) | string | The unique identifier for the completion. | `chatcmpl-123` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From my point of view, user feedback often relates to the overall output of an LLM application (which used multiple LLM completions to produce a final response to the user). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| [`gen_ai.evaluation.score`](/docs/attributes-registry/gen-ai.md) | double | Quantified score calculated based on the user reaction in [-1.0, 1.0] range with 0 representing a neutral reaction. | `0.42` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | ||
|
||
|
||
<!-- markdownlint-restore --> | ||
<!-- prettier-ignore-end --> | ||
<!-- END AUTOGENERATED TEXT --> | ||
<!-- endsemconv --> | ||
|
||
The user feedback event body has the following structure: | ||
|
||
| Body Field | Type | Description | Examples | Requirement Level | | ||
|---|---|---|---|---| | ||
| `comment` | string | Additional details about the user feedback | `"I did not like it"` | `Opt-in` | | ||
|
||
[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
groups: | ||
- id: gen_ai.content.prompt | ||
name: gen_ai.content.prompt | ||
stability: experimental | ||
type: event | ||
brief: > | ||
In the lifetime of an GenAI span, events for prompts sent and completions received | ||
may be created, depending on the configuration of the instrumentation. | ||
attributes: | ||
- ref: gen_ai.prompt | ||
requirement_level: | ||
conditionally_required: if and only if corresponding event is enabled | ||
note: > | ||
It's RECOMMENDED to format prompts as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) | ||
|
||
- id: gen_ai.content.completion | ||
name: gen_ai.content.completion | ||
type: event | ||
stability: experimental | ||
brief: > | ||
In the lifetime of an GenAI span, events for prompts sent and completions received | ||
may be created, depending on the configuration of the instrumentation. | ||
attributes: | ||
- ref: gen_ai.completion | ||
requirement_level: | ||
conditionally_required: if and only if corresponding event is enabled | ||
note: > | ||
It's RECOMMENDED to format completions as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation) | ||
|
||
- id: gen_ai.evaluation.user_feedback | ||
name: gen_ai.evaluation.user_feedback | ||
type: event | ||
stability: experimental | ||
brief: > | ||
This event describes the evaluation of GenAI response based on the user feedback. | ||
attributes: | ||
- ref: gen_ai.response.id | ||
requirement_level: required | ||
- ref: gen_ai.evaluation.score | ||
brief: > | ||
Quantified score calculated based on the user reaction in [-1.0, 1.0] range with 0 representing a neutral reaction. | ||
note: "" | ||
requirement_level: recommended |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussing at GenAI call:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gen_ai.evaluation.relevance
dimensions: