open-telemetry · truptiparkar7 · Aug 6, 2024 · Aug 6, 2024 · Aug 6, 2024 · Aug 6, 2024
@@ -17,8 +17,9 @@ This document defines the attributes used to describe telemetry in the context o
 | Attribute                          | Type     | Description                                                                                      | Examples                                                                | Stability                                                        |
 | ---------------------------------- | -------- | ------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------- | ---------------------------------------------------------------- |
 | `gen_ai.completion`                | string   | The full response received from the GenAI model. [1]                                             | `[{'role': 'assistant', 'content': 'The capital of France is Paris.'}]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
-| `gen_ai.operation.name`            | string   | The name of the operation being performed. [2]                                                   | `chat`; `text_completion`                                               | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
-| `gen_ai.prompt`                    | string   | The full prompt sent to the GenAI model. [3]                                                     | `[{'role': 'user', 'content': 'What is the capital of France?'}]`       | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| `gen_ai.evaluation.score`          | double   | The score calculated by the evaluator for the GenAI response. [2]                                | `0.42`                                                                  | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| `gen_ai.operation.name`            | string   | The name of the operation being performed. [3]                                                   | `chat`; `text_completion`                                               | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| `gen_ai.prompt`                    | string   | The full prompt sent to the GenAI model. [4]                                                     | `[{'role': 'user', 'content': 'What is the capital of France?'}]`       | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | `gen_ai.request.frequency_penalty` | double   | The frequency penalty setting for the GenAI request.                                             | `0.1`                                                                   | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | `gen_ai.request.max_tokens`        | int      | The maximum number of tokens the model generates for a request.                                  | `100`                                                                   | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | `gen_ai.request.model`             | string   | The name of the GenAI model a request is being made to.                                          | `gpt-4`                                                                 | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
@@ -30,18 +31,20 @@ This document defines the attributes used to describe telemetry in the context o
 | `gen_ai.response.finish_reasons`   | string[] | Array of reasons the model stopped generating tokens, corresponding to each generation received. | `["stop"]`; `["stop", "length"]`                                        | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | `gen_ai.response.id`               | string   | The unique identifier for the completion.                                                        | `chatcmpl-123`                                                          | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | `gen_ai.response.model`            | string   | The name of the model that generated the response.                                               | `gpt-4-0613`                                                            | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
-| `gen_ai.system`                    | string   | The Generative AI product as identified by the client or server instrumentation. [4]             | `openai`                                                                | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| `gen_ai.system`                    | string   | The Generative AI product as identified by the client or server instrumentation. [5]             | `openai`                                                                | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | `gen_ai.token.type`                | string   | The type of token being counted.                                                                 | `input`; `output`                                                       | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | `gen_ai.usage.input_tokens`        | int      | The number of tokens used in the GenAI input (prompt).                                           | `100`                                                                   | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 | `gen_ai.usage.output_tokens`       | int      | The number of tokens used in the GenAI response (completion).                                    | `180`                                                                   | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
 
 **[1]:** It's RECOMMENDED to format completions as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation)
 
-**[2]:** If one of the predefined values applies, but specific system uses a different name it's RECOMMENDED to document it in the semantic conventions for specific GenAI system and use system-specific name in the instrumentation. If a different name is not documented, instrumentation libraries SHOULD use applicable predefined value.
+**[2]:** Semantic conventions describing GenAI evaluation telemetry SHOULD document the scoring system and method used to calculate the score.
 
-**[3]:** It's RECOMMENDED to format prompts as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation)
+**[3]:** If one of the predefined values applies, but specific system uses a different name it's RECOMMENDED to document it in the semantic conventions for specific GenAI system and use system-specific name in the instrumentation. If a different name is not documented, instrumentation libraries SHOULD use applicable predefined value.
 
-**[4]:** The `gen_ai.system` describes a family of GenAI models with specific model identified
+**[4]:** It's RECOMMENDED to format prompts as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation)
+
+**[5]:** The `gen_ai.system` describes a family of GenAI models with specific model identified
 by `gen_ai.request.model` and `gen_ai.response.model` attributes.
 
 The actual GenAI product may differ from the one identified by the client.

@@ -0,0 +1,50 @@
+
+<!--- Hugo front matter used to generate the website version of this page:
+linkTitle: Generative AI evaluation events
+--->
+
+# Semantic Conventions for GenAI evaluation events
+
+**Status**: [Experimental][DocumentStatus]
+
+Each evaluation event defines a common way to report an evaluation score and the context for this specific evaluation method.
+
+## Naming pattern
+
+The evaluation events follow `gen_ai.evaluation.{evaluation method}` naming pattern.
+For example, evaluations that are common across different GenAI models and framework tooling, such as user feedback should be reported as `gen_ai.evaluation.user_feedback`.
+
+GenAI vendor-specific evaluation events SHOULD follow `gen_ai.{gen_ai.system}.evaluation.{evaluation method}` pattern.
+
+## User feedback evaluation
+
+The user feedback evaluation event SHOULD be captured if and only if user provided a reaction to GenAI model response.
+It SHOULD, when possible, be parented to the GenAI span describing such response.
+
+<!-- semconv gen_ai.evaluation.user_feedback -->
+<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
+<!-- see templates/registry/markdown/snippet.md.j2 -->
+<!-- prettier-ignore-start -->
+<!-- markdownlint-capture -->
+<!-- markdownlint-disable -->
+
+The event name MUST be `gen_ai.evaluation.user_feedback`.
+
+| Attribute  | Type | Description  | Examples  | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
+|---|---|---|---|---|---|
+| [`gen_ai.response.id`](/docs/attributes-registry/gen-ai.md) | string | The unique identifier for the completion. | `chatcmpl-123` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+| [`gen_ai.evaluation.score`](/docs/attributes-registry/gen-ai.md) | double | Quantified score calculated based on the user reaction in [-1.0, 1.0] range with 0 representing a neutral reaction. | `0.42` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
+
+
+<!-- markdownlint-restore -->
+<!-- prettier-ignore-end -->
+<!-- END AUTOGENERATED TEXT -->
+<!-- endsemconv -->
+
+The user feedback event body has the following structure:
+
+| Body Field | Type | Description | Examples | Requirement Level |
+|---|---|---|---|---|
+| `comment` | string | Additional details about the user feedback | `"I did not like it"` | `Opt-in` |
+
+[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status
@@ -175,4 +175,4 @@ The event name MUST be `gen_ai.content.completion`.
 <!-- END AUTOGENERATED TEXT -->
 <!-- endsemconv -->
 
-[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md
+[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status
@@ -0,0 +1,43 @@
+groups:
+  - id: gen_ai.content.prompt
+    name: gen_ai.content.prompt
+    stability: experimental
+    type: event
+    brief: >
+      In the lifetime of an GenAI span, events for prompts sent and completions received
+      may be created, depending on the configuration of the instrumentation.
+    attributes:
+      - ref: gen_ai.prompt
+        requirement_level:
+          conditionally_required: if and only if corresponding event is enabled
+        note: >
+          It's RECOMMENDED to format prompts as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation)
+
+  - id: gen_ai.content.completion
+    name: gen_ai.content.completion
+    type: event
+    stability: experimental
+    brief: >
+      In the lifetime of an GenAI span, events for prompts sent and completions received
+      may be created, depending on the configuration of the instrumentation.
+    attributes:
+      - ref: gen_ai.completion
+        requirement_level:
+          conditionally_required: if and only if corresponding event is enabled
+        note: >
+          It's RECOMMENDED to format completions as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation)
+
+  - id: gen_ai.evaluation.user_feedback
+    name: gen_ai.evaluation.user_feedback
+    type: event
+    stability: experimental
+    brief: >
+      This event describes the evaluation of GenAI response based on the user feedback.
+    attributes:
+      - ref: gen_ai.response.id
+        requirement_level: required
+      - ref: gen_ai.evaluation.score
+        brief: >
+          Quantified score calculated based on the user reaction in [-1.0, 1.0] range with 0 representing a neutral reaction.
+        note: ""
+        requirement_level: recommended
@@ -1,6 +1,7 @@
 groups:
   - id: registry.gen_ai
     type: attribute_group
+    stability: experimental
     display_name: GenAI Attributes
     brief: >
       This document defines the attributes used to describe telemetry in the context of Generative Artificial Intelligence (GenAI) Models requests and responses.
@@ -148,8 +149,18 @@ groups:
           If one of the predefined values applies, but specific system uses a different name it's RECOMMENDED to document it in the semantic
           conventions for specific GenAI system and use system-specific name in the instrumentation.
           If a different name is not documented, instrumentation libraries SHOULD use applicable predefined value.
+      - id: gen_ai.evaluation.score
+        stability: experimental
+        type: double
+        brief: The score calculated by the evaluator for the GenAI response.
+        note: >
+          Semantic conventions describing GenAI evaluation telemetry SHOULD document
+          the scoring system and method used to calculate the score.
+        examples: [0.42]
+
   - id: registry.gen_ai.openai
     type: attribute_group
+    stability: experimental
     display_name: OpenAI Attributes
     brief: >
       Thie group defines attributes for OpenAI.

@@ -58,32 +58,6 @@ groups:
       - gen_ai.content.prompt
       - gen_ai.content.completion
 
-  - id: gen_ai.content.prompt
-    name: gen_ai.content.prompt
-    type: event
-    brief: >
-      In the lifetime of an GenAI span, events for prompts sent and completions received
-      may be created, depending on the configuration of the instrumentation.
-    attributes:
-      - ref: gen_ai.prompt
-        requirement_level:
-          conditionally_required: if and only if corresponding event is enabled
-        note: >
-          It's RECOMMENDED to format prompts as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation)
-
-  - id: gen_ai.content.completion
-    name: gen_ai.content.completion
-    type: event
-    brief: >
-      In the lifetime of an GenAI span, events for prompts sent and completions received
-      may be created, depending on the configuration of the instrumentation.
-    attributes:
-      - ref: gen_ai.completion
-        requirement_level:
-          conditionally_required: if and only if corresponding event is enabled
-        note: >
-          It's RECOMMENDED to format completions as JSON string matching [OpenAI messages format](https://platform.openai.com/docs/guides/text-generation)
-
   - id: trace.gen_ai.client
     extends: trace.gen_ai.client.common
     brief: >