-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
eda71c5
commit c742d6b
Showing
11 changed files
with
392 additions
and
224 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
--- | ||
title: AI SDK | ||
--- | ||
|
||
Vercel's [AI SDK](https://sdk.vercel.ai/docs/introduction) is a great way to get started with AI in your apps. | ||
|
||
It abstracts away the differences between different AI providers, so you can **switch between them easily**. | ||
|
||
Here's how it might look with Evalite: | ||
|
||
```ts | ||
// my-eval.eval.ts | ||
|
||
import { openai } from "@ai-sdk/openai"; | ||
import { streamText } from "ai"; | ||
import { Factuality, Levenshtein } from "autoevals"; | ||
import { evalite } from "evalite"; | ||
import { traceAISDKModel } from "evalite/ai-sdk"; | ||
|
||
evalite("Test Capitals", { | ||
data: async () => [ | ||
{ | ||
input: `What's the capital of France?`, | ||
expected: `Paris`, | ||
}, | ||
{ | ||
input: `What's the capital of Germany?`, | ||
expected: `Berlin`, | ||
}, | ||
], | ||
task: async (input) => { | ||
const result = await streamText({ | ||
model: traceAISDKModel(openai("gpt-4o-mini")), | ||
system: ` | ||
Answer the question concisely. Answer in as few words as possible. | ||
Remove full stops from the end of the output. | ||
If the country has no capital, return '<country> has no capital'. | ||
If the country does not exist, return 'Unknown'. | ||
`, | ||
prompt: input, | ||
}); | ||
|
||
return result.textStream; | ||
}, | ||
scorers: [Factuality, Levenshtein], | ||
}); | ||
``` |
51 changes: 51 additions & 0 deletions
51
apps/evalite-docs/src/content/docs/guides/environment-variables.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
--- | ||
title: Environment Variables | ||
--- | ||
|
||
import { Steps } from "@astrojs/starlight/components"; | ||
|
||
To call your LLM from a third-party service, you'll likely need some environment variables to keep your API keys safe. | ||
|
||
Since **Evalite is based on Vitest**, it should already pick them up from your `vite.config.ts`. | ||
|
||
## Setting Up Env Variables | ||
|
||
If you don't have Vitest set up, here's how to do it: | ||
|
||
<Steps> | ||
|
||
1. Create a `.env` file in the root of your project: | ||
|
||
``` | ||
OPENAI_API_KEY=your-api-key | ||
``` | ||
|
||
2. Add `.env` to your `.gitignore`, if it's not already there | ||
|
||
``` | ||
.env | ||
``` | ||
|
||
3. Install `dotenv`: | ||
|
||
```bash | ||
pnpm add -D dotenv | ||
``` | ||
|
||
4. Add a `vite.config.ts` file: | ||
|
||
```ts | ||
// vite.config.ts | ||
|
||
import { defineConfig } from "vite/config"; | ||
|
||
export default defineConfig({ | ||
test: { | ||
setupFiles: ["dotenv/config"], | ||
}, | ||
}); | ||
``` | ||
|
||
</Steps> | ||
|
||
Now, your environment variables will be available in your evals. |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
--- | ||
title: Scorers | ||
--- | ||
|
||
import { Aside } from "@astrojs/starlight/components"; | ||
|
||
Scorers are used to score the output of your LLM call. | ||
|
||
[Autoevals](https://github.com/braintrustdata/autoevals) is a great library of scorers to get you started. | ||
|
||
You can create your own using `createScorer`: | ||
|
||
```ts | ||
import { createScorer } from "evalite"; | ||
|
||
const containsParis = createScorer<string>({ | ||
name: "Contains Paris", | ||
description: "Checks if the output contains the word 'Paris'.", | ||
score: (output) => { | ||
return output.includes("Paris") ? 1 : 0; | ||
}, | ||
}); | ||
|
||
evalite("My Eval", { | ||
data: async () => { | ||
return [{ input: "Hello", output: "Hello World!" }]; | ||
}, | ||
task: async (input) => { | ||
return input + " World!"; | ||
}, | ||
scorers: [containsParis], | ||
}); | ||
``` | ||
|
||
The `name` and `description` of the scorer will be displayed in the Evalite UI. | ||
|
||
## Scorer Metadata | ||
|
||
You can provide metadata along with your custom scorer: | ||
|
||
```ts | ||
import { createScorer } from "evalite"; | ||
|
||
const containsParis = createScorer<string>({ | ||
name: "Contains Paris", | ||
description: "Checks if the output contains the word 'Paris'.", | ||
score: (output) => { | ||
return { | ||
score: output.includes("Paris") ? 1 : 0, | ||
metadata: { | ||
// Can be anything! | ||
}, | ||
}; | ||
}, | ||
}); | ||
``` | ||
|
||
This will be visible along with the score in the Evalite UI. | ||
|
||
<Aside type="tip"> | ||
|
||
This is especially useful for debugging LLM-as-a-judge evals. In autoevals `Factuality` scorer, the metadata will include a rationale for why the scorer gave the score it did. | ||
|
||
</Aside> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
--- | ||
title: Streams | ||
--- | ||
|
||
You can handle streams in Evalite by returning any async iterable (including a `ReadableStream`) from your task. This means you can test functions like the AI SDK `streamText` function easily: | ||
|
||
```ts | ||
import { evalite } from "evalite"; | ||
import { streamText } from "ai"; | ||
import { openai } from "@ai-sdk/openai"; | ||
import { Factuality } from "autoevals"; | ||
|
||
evalite("My Eval", { | ||
data: async () => { | ||
return [{ input: "What is the capital of France?", expected: "Paris" }]; | ||
}, | ||
task: async (input) => { | ||
const result = await streamText({ | ||
model: openai("your-model"), | ||
system: `Answer the question concisely.`, | ||
prompt: input, | ||
}); | ||
|
||
return result.textStream; | ||
}, | ||
scorers: [Factuality], | ||
}); | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
--- | ||
title: Traces | ||
--- | ||
|
||
import { Aside } from "@astrojs/starlight/components"; | ||
|
||
Traces are used to track the behaviour of each individual call to an LLM inside your task. | ||
|
||
## `reportTrace` | ||
|
||
You can report a trace by calling `reportTrace` inside an `evalite` eval: | ||
|
||
```ts | ||
import { evalite, type Evalite } from "evalite"; | ||
import { reportTrace } from "evalite/evals"; | ||
|
||
evalite("My Eval", { | ||
data: async () => { | ||
return [{ input: "Hello", expected: "Hello World!" }]; | ||
}, | ||
task: async (input) => { | ||
// Track the start time | ||
const start = performance.now(); | ||
|
||
// Call our LLM | ||
const result = await myLLMCall(); | ||
|
||
// Report the trace once it's finished | ||
reportTrace({ | ||
start, | ||
end: performance.now(), | ||
output: result.output, | ||
input: [ | ||
{ | ||
role: "user", | ||
content: input, | ||
}, | ||
], | ||
usage: { | ||
completionTokens: result.completionTokens, | ||
promptTokens: result.promptTokens, | ||
}, | ||
}); | ||
|
||
// Return the output | ||
return result.output; | ||
}, | ||
scorers: [Levenshtein], | ||
}); | ||
``` | ||
|
||
<Aside> | ||
|
||
`reportTrace` is a no-op in production, so you can leave it in your code without worrying about performance. | ||
|
||
</Aside> | ||
|
||
## `traceAISDKModel` | ||
|
||
If you're using the [Vercel AI SDK](https://sdk.vercel.ai/docs/introduction), you can automatically report traces by wrapping your model in `traceAISDKModel` function: | ||
|
||
```ts | ||
import { traceAISDKModel } from "evalite/ai-sdk"; | ||
import { generateText } from "ai"; | ||
import { openai } from "@ai-sdk/openai"; | ||
|
||
// All calls to this model will be recorded in evalite! | ||
const tracedModel = traceAISDKModel(openai("gpt-4o-mini")); | ||
|
||
const result = await generateText({ | ||
model: tracedModel, | ||
system: `Answer the question concisely.`, | ||
prompt: `What is the capital of France?`, | ||
}); | ||
``` | ||
|
||
<Aside> | ||
|
||
`traceAISDKModel`, like `reportTrace`, is a no-op in production. | ||
|
||
</Aside> |
Oops, something went wrong.