Skip to content

[MLOB-3125] Conversation Completeness Evaluation #30140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions content/en/llm_observability/evaluations/ootb_evaluations.md
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,16 @@
|---|---|---|
| Evaluated on Input and Output | Evaluated using LLM | Sentiment flags the emotional tone or attitude expressed in the text, categorizing it as positive, negative, or neutral. |

#### Conversation Completeness

Check warning on line 249 in content/en/llm_observability/evaluations/ootb_evaluations.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.headings

'Conversation Completeness' should use sentence-style capitalization.

This check evaluates whether your LLM chatbot can successfully carry out a full conversation by effectively meeting the user's needs from start to finish. This conversational measure serves as a proxy for gauging user satisfaction over the course of a conversation and is especially valuable for LLM chatbot applications.

| Evaluation Stage | Evaluation Method | Evaluation Definition |
|---|---|---|
| Evaluated on Conversation | Evaluated using LLM | Conversation Completeness assesses whether all user intentions within a conversation were successfully resolved. The evaluation identifies resolved and unresolved intentions, providing a completeness score based on the ratio of unresolved to total intentions. |

For optimal evaluation accuracy, it is preferable to send a tag when the conversation is finished and configure the evaluation to run only on conversations with this tag. The evaluation returns a detailed breakdown including resolved intentions, unresolved intentions, and reasoning for the assessment. A conversation is considered incomplete if more than 50% of identified intentions remain unresolved.

Check notice on line 257 in content/en/llm_observability/evaluations/ootb_evaluations.md

View workflow job for this annotation

GitHub Actions / vale

Datadog.sentencelength

Suggestion: Try to keep your sentence length to 25 words or fewer.

### Security and Safety evaluations

#### Toxicity
Expand Down
Loading