Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Obs AI Assistant] Improve LLM evaluation framework #204574

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
c2b5270
[Obs AI Assistant] Update evaluation framework readme
viduni94 Dec 13, 2024
7641ecb
[Obs AI Assistant] Fix auth for the kibana url when custom elasticsea…
viduni94 Dec 13, 2024
315ba30
[Obs AI Assistant] Create dataview if it doesn't exist
viduni94 Dec 13, 2024
18f2b21
[Obs AI Assistant] Logs for service urls
viduni94 Dec 13, 2024
7519314
[Obs AI Assistant] Temp skip for scenarios except alerts
viduni94 Dec 13, 2024
124028b
[Obs AI Assistant] Add header to enable accessing internal APIs
viduni94 Dec 13, 2024
e23ebf1
[Obs AI Assistant] Fix apm afterAll hook
viduni94 Dec 13, 2024
51589f9
[Obs AI Assistant] Update error handling
viduni94 Dec 17, 2024
545ab41
[Obs AI Assistant] Update calls to internal urls
viduni94 Dec 17, 2024
b5dbce8
[Obs AI Assistant] Improve data view creation
viduni94 Dec 17, 2024
84972ea
[Obs AI Assistant] Change internal origin to Kibana
viduni94 Dec 17, 2024
3b7d770
[Obs AI Assistant] Improve scopes handling in the chat client
viduni94 Dec 17, 2024
91d80f5
[Obs AI Assistant] Update elasticsearch and es|ql scope before/after …
viduni94 Dec 17, 2024
221abdf
[Obs AI Assistant] Fix eslint issues
viduni94 Dec 18, 2024
34da5d5
[Obs AI Assistant] Fix eslint issues
viduni94 Dec 18, 2024
1b57832
[Obs AI Assistant] Add new scenario/test for KB retrieval
viduni94 Dec 18, 2024
6c0e59b
[Obs AI Assistant] Add new scenario for documentation and improve log…
viduni94 Dec 18, 2024
5d7fe68
[Obs AI Assistant] Improve readme
viduni94 Dec 20, 2024
c2d1d65
[CI] Auto-commit changed files from 'node scripts/lint_ts_projects --…
kibanamachine Dec 23, 2024
112383c
[CI] Auto-commit changed files from 'node scripts/eslint --no-cache -…
kibanamachine Dec 23, 2024
8da301b
[Obs AI Assistant] Address PR comments
viduni94 Dec 24, 2024
541fa80
[Obs AI Assistant] Revert auth change as it's not necessary
viduni94 Dec 24, 2024
bc3480e
[Obs AI Assistant] Make scope a part of the complete function
viduni94 Dec 24, 2024
8c9911e
[CI] Auto-commit changed files from 'node scripts/eslint --no-cache -…
kibanamachine Dec 24, 2024
40c6445
[Obs AI Assistant] remove comment
viduni94 Dec 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Overview

This tool is developed for our team working on the Elastic Observability platform, specifically focusing on evaluating the Observability AI Assistant. It simplifies scripting and evaluating various scenarios with the Large Language Model (LLM) integration.
This tool is developed for our team working on the Elastic Observability platform, specifically focusing on evaluating the Observability AI Assistant. It simplifies scripting and evaluating various scenarios with Large Language Model (LLM) integrations.

## Setup requirements

Expand All @@ -12,26 +12,40 @@ This tool is developed for our team working on the Elastic Observability platfor

## Running evaluations

Run the tool using:

`$ node x-pack/solutions/observability/plugins/observability_solution/observability_ai_assistant_app/scripts/evaluation/index.js`

This will evaluate all existing scenarios, and write the evaluation results to the terminal.

### Configuration

#### Kibana and Elasticsearch

By default, the tool will look for a Kibana instance running locally (at `http://localhost:5601`, which is the default address for running Kibana in development mode). It will also attempt to read the Kibana config file for the Elasticsearch address & credentials. If you want to override these settings, use `--kibana` and `--es`. Only basic auth is supported, e.g. `--kibana http://username:password@localhost:5601`. If you want to use a specific space, use `--spaceId`
#### To run the evaluation using a local Elasticsearch and Kibana instance:

#### Connector
- Run Elasticsearch locally: `yarn es snapshot --license trial`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--license trial is unnecessary no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without --license trial, I can't call the Obs AI Assistant endpoints.

error-without-license-1 Screenshot 2024-12-24 at 8 29 55 AM

- Start Kibana (Default address for Kibana in dev mode: `http://localhost:5601`)
- Run this command to start evaluating:
`$ node x-pack/solutions/observability/plugins/observability_ai_assistant_app/scripts/evaluation/index.js`

Use `--connectorId` to specify a `.gen-ai` or `.bedrock` connector to use. If none are given, it will prompt you to select a connector based on the ones that are available. If only a single supported connector is found, it will be used without prompting.

#### Persisting conversations

By default, completed conversations are not persisted. If you do want to persist them, for instance for reviewing purposes, set the `--persist` flag to store them. This will also generate a clickable link in the output of the evaluation that takes you to the conversation.

If you want to clear conversations on startup, use the `--clear` flag. This only works when `--persist` is enabled. If `--spaceId` is set, only conversations for the current space will be cleared.
This will evaluate all existing scenarios, and write the evaluation results to the terminal.

When storing conversations, the name of the scenario is used as a title. Set the `--autoTitle` flag to have the LLM generate a title for you.
#### To run the evaluation using a hosted deployment:
- Add the credentials of Elasticsearch to `kibana.dev.yml` as follows:
```
elasticsearch.hosts: https://<hosted-url>:<port>
elasticsearch.username: <username>
elasticsearch.password: <password>
elasticsearch.ssl.verificationMode: none
elasticsearch.ignoreVersionMismatch: true
```
- Start Kibana
- Run this command to start evaluating: `node x-pack/solutions/observability/plugins/observability_ai_assistant_app/scripts/evaluation/index.js --kibana http://<username>:<password>@localhost:5601`

By default the script will use the Elasticsearch credentials specified in `kibana.dev.yml`, if you want to override it use the `--es` flag when running the evaluation script:
E.g.: `node x-pack/solutions/observability/plugins/observability_ai_assistant_app/scripts/evaluation/index.js --kibana http://<username>:<password>@localhost:5601 --es https://<username>:<password>@<hosted-url>:<port>`

The `--kibana` and `--es` flags override the default credentials. Only basic auth is supported.

## Other (optional) configuration flags
- `--connectorId` - Specify a generative AI connector to use. If none are given, it will prompt you to select a connector based on the ones that are available. If only a single supported connector is found, it will be used without prompting.
- `--evaluateWith`: The connector ID to evaluate with. Leave empty to use the same connector, use "other" to get a selection menu.
- `--spaceId` - Specify the space ID if you want to use a specific space.
- `--persist` - By default, completed conversations are not persisted. If you want to persist them, for instance for reviewing purposes, include this flag when running the evaluation script. This will also generate a clickable link in the output of the evaluation that takes you to the conversation in Kibana.
- `--clear` - If you want to clear conversations on startup, include this command when running the evaluation script. This only works when `--persist` is enabled. If `--spaceId` is set, only conversations for the current space will be cleared
- `--autoTitle`: When storing conversations, the name of the scenario is used as a title. Set this flag to have the LLM generate a title for you. This only works when `--persist` is enabled.
- `--files`: A file or list of files containing the scenarios to evaluate. Defaults to all.
- `--grep`: A string or regex to filter scenarios by.
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ function runEvaluations() {
kibana: argv.kibana,
});

log.info(`Elasticsearch URL: ${serviceUrls.esUrl}`);

const kibanaClient = new KibanaClient(log, serviceUrls.kibanaUrl, argv.spaceId);
const esClient = new Client({
node: serviceUrls.esUrl,
Expand Down Expand Up @@ -100,7 +102,7 @@ function runEvaluations() {
evaluationConnectorId: evaluationConnector.id!,
persist: argv.persist,
suite: mocha.suite,
scopes: ['all'],
scopes: ['observability'],
});

const header: string[][] = [
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ import { Message, MessageRole } from '@kbn/observability-ai-assistant-plugin/com
import { streamIntoObservable } from '@kbn/observability-ai-assistant-plugin/server';
import { ToolingLog } from '@kbn/tooling-log';
import axios, { AxiosInstance, AxiosResponse, isAxiosError } from 'axios';
import { isArray, omit, pick, remove } from 'lodash';
import { omit, pick, remove } from 'lodash';
import pRetry from 'p-retry';
import {
concatMap,
Expand Down Expand Up @@ -59,13 +59,14 @@ interface Options {
screenContexts?: ObservabilityAIAssistantScreenContext[];
}

type CompleteFunction = (
...args:
| [StringOrMessageList]
| [StringOrMessageList, Options]
| [string | undefined, StringOrMessageList]
| [string | undefined, StringOrMessageList, Options]
) => Promise<{
interface CompleteFunctionParams {
messages: StringOrMessageList;
conversationId?: string;
options?: Options;
scope?: AssistantScope;
}

type CompleteFunction = (params: CompleteFunctionParams) => Promise<{
conversationId?: string;
messages: InnerMessage[];
errors: ChatCompletionErrorEvent[];
Expand All @@ -74,7 +75,6 @@ type CompleteFunction = (
export interface ChatClient {
chat: (message: StringOrMessageList) => Promise<InnerMessage>;
complete: CompleteFunction;

evaluate: (
{}: { conversationId?: string; messages: InnerMessage[]; errors: ChatCompletionErrorEvent[] },
criteria: string[]
Expand Down Expand Up @@ -124,10 +124,10 @@ export class KibanaClient {
return this.axios<T>({
method,
url,
data: data || {},
...(method.toLowerCase() !== 'delete' ? { data: data || {} } : {}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Contributor Author

@viduni94 viduni94 Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this condition, deleting ruleIds fails here - https://github.com/elastic/kibana/pull/204574/files#diff-23cc9139c91a064a3ca574552ad823023c579cc2c68ff7f277c392102a0d526aL139

Because the DELETE method doesn't allow an undefined or empty body.

Screenshot 2024-12-24 at 8 38 09 AM

headers: {
'kbn-xsrf': 'true',
'x-elastic-internal-origin': 'foo',
'x-elastic-internal-origin': 'Kibana',
},
}).catch((error) => {
if (isAxiosError(error)) {
Expand All @@ -148,7 +148,7 @@ export class KibanaClient {
}

async installKnowledgeBase() {
this.log.debug('Checking to see whether knowledge base is installed');
this.log.info('Checking whether the knowledge base is installed');

const {
data: { ready },
Expand All @@ -157,7 +157,7 @@ export class KibanaClient {
});

if (ready) {
this.log.info('Knowledge base is installed');
this.log.success('Knowledge base is already installed');
return;
}

Expand All @@ -176,15 +176,15 @@ export class KibanaClient {
{ retries: 10 }
);

this.log.info('Knowledge base installed');
this.log.success('Knowledge base installed');
}

async createSpaceIfNeeded() {
if (!this.spaceId) {
return;
}

this.log.debug(`Checking if space ${this.spaceId} exists`);
this.log.info(`Checking if space ${this.spaceId} exists`);

const spaceExistsResponse = await this.callKibana<{
id?: string;
Expand All @@ -204,7 +204,7 @@ export class KibanaClient {
});

if (spaceExistsResponse.data.id) {
this.log.debug(`Space id ${this.spaceId} found`);
this.log.success(`Space id ${this.spaceId} found`);
return;
}

Expand All @@ -223,14 +223,26 @@ export class KibanaClient {
);

if (spaceCreatedResponse.status === 200) {
this.log.info(`Created space ${this.spaceId}`);
this.log.success(`Created space ${this.spaceId}`);
} else {
throw new Error(
`Error creating space: ${spaceCreatedResponse.status} - ${spaceCreatedResponse.data}`
);
}
}

getMessages(message: string | Array<Message['message']>): Array<Message['message']> {
if (typeof message === 'string') {
return [
{
content: message,
role: MessageRole.User,
},
];
}
return message;
}

createChatClient({
connectorId,
evaluationConnectorId,
Expand All @@ -244,22 +256,11 @@ export class KibanaClient {
suite?: Mocha.Suite;
scopes: AssistantScope[];
}): ChatClient {
function getMessages(message: string | Array<Message['message']>): Array<Message['message']> {
if (typeof message === 'string') {
return [
{
content: message,
role: MessageRole.User,
},
];
}
return message;
}

const that = this;

let currentTitle: string = '';
let firstSuiteName: string = '';
let currentScopes = scopes;

if (suite) {
suite.beforeEach(function () {
Expand Down Expand Up @@ -362,23 +363,27 @@ export class KibanaClient {
that.log.info('Chat', name);

const chat$ = defer(() => {
that.log.debug(`Calling chat API`);
that.log.info('Calling the /chat API');
const params: ObservabilityAIAssistantAPIClientRequestParamsOf<'POST /internal/observability_ai_assistant/chat'>['params']['body'] =
{
name,
messages,
connectorId: connectorIdOverride || connectorId,
functions: functions.map((fn) => pick(fn, 'name', 'description', 'parameters')),
functionCall,
scopes,
scopes: currentScopes,
};

return that.axios.post(
that.getUrl({
pathname: '/internal/observability_ai_assistant/chat',
}),
params,
{ responseType: 'stream', timeout: NaN }
{
responseType: 'stream',
timeout: NaN,
headers: { 'x-elastic-internal-origin': 'Kibana' },
}
);
}).pipe(
switchMap((response) => streamIntoObservable(response.data)),
Expand All @@ -400,54 +405,33 @@ export class KibanaClient {
return {
chat: async (message) => {
const messages = [
...getMessages(message).map((msg) => ({
...this.getMessages(message).map((msg) => ({
message: msg,
'@timestamp': new Date().toISOString(),
})),
];
return chat('chat', { messages, functions: [] });
},
complete: async (...args) => {
that.log.info(`Complete`);
let messagesArg: StringOrMessageList | undefined;
let conversationId: string | undefined;
let options: Options = {};

function isMessageList(arg: any): arg is StringOrMessageList {
return isArray(arg) || typeof arg === 'string';
}
complete: async ({
messages: messagesArg,
conversationId,
options = {},
scope: newScope,
}: CompleteFunctionParams) => {
that.log.info('Calling complete');

// | [StringOrMessageList]
// | [StringOrMessageList, Options]
// | [string, StringOrMessageList]
// | [string, StringOrMessageList, Options]
if (args.length === 1) {
messagesArg = args[0];
} else if (args.length === 2 && !isMessageList(args[1])) {
messagesArg = args[0];
options = args[1];
} else if (
args.length === 2 &&
(typeof args[0] === 'string' || typeof args[0] === 'undefined') &&
isMessageList(args[1])
) {
conversationId = args[0];
messagesArg = args[1];
} else if (args.length === 3) {
conversationId = args[0];
messagesArg = args[1];
options = args[2];
}
// set scope
currentScopes = [newScope || 'observability'];

const messages = [
...getMessages(messagesArg!).map((msg) => ({
...this.getMessages(messagesArg!).map((msg) => ({
message: msg,
'@timestamp': new Date().toISOString(),
})),
];

const stream$ = defer(() => {
that.log.debug(`Calling /chat/complete API`);
that.log.info(`Calling /chat/complete API`);
return from(
that.axios.post(
that.getUrl({
Expand All @@ -460,9 +444,13 @@ export class KibanaClient {
connectorId,
persist,
title: currentTitle,
scopes,
scopes: currentScopes,
},
{ responseType: 'stream', timeout: NaN }
{
responseType: 'stream',
timeout: NaN,
headers: { 'x-elastic-internal-origin': 'Kibana' },
}
)
);
}).pipe(
Expand Down Expand Up @@ -615,7 +603,7 @@ export class KibanaClient {
})
.concat({
score: errors.length === 0 ? 1 : 0,
criterion: 'The conversation encountered errors',
criterion: 'The conversation did not encounter any errors',
reasoning: errors.length
? `The following errors occurred: ${errors.map((error) => error.error.message)}`
: 'No errors occurred',
Expand Down
Loading