Skip to content

Commit

Permalink
Added contributing guide
Browse files Browse the repository at this point in the history
  • Loading branch information
mattpocock committed Dec 5, 2024
1 parent c400152 commit b3b1b83
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 237 deletions.
8 changes: 0 additions & 8 deletions .husky/pre-commit
Original file line number Diff line number Diff line change
@@ -1,8 +0,0 @@
FILE="readme.md"
if git diff --cached --name-only | grep -Fx "$FILE" > /dev/null; then
echo "Error: $FILE has been modified. Please move your changes to packages/evalite/readme.md instead."
exit 1
fi

cp packages/evalite/readme.md readme.md
git add readme.md
3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@
},
"private": true,
"scripts": {
"dev": "pnpm run -r --parallel dev",
"dev": "turbo watch dev",
"wsl:dev": "pnpm run -r --parallel dev",
"ci": "turbo build test lint after-build",
"build": "turbo build after-build",
"release": "pnpm run ci && changeset publish",
Expand Down
239 changes: 11 additions & 228 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,240 +2,23 @@

The TypeScript-native, local-first tool for testing LLM-powered apps.

- Fully open source: **No API Key required**.
- Local-first: runs on your machine, your data never leaves your laptop.
- Based on [Vitest](https://vitest.dev/), the best TypeScript test runner around.
- Terminal UI for quick prototyping.
- Supports tracing and custom scorers.
- [View the docs](./packages/evalite/readme.md)

## Quickstart
## Contributing

### 1. Install `evalite` and `autoevals`:
1. Create a .env file inside `packages/example` containing an `OPENAI_API_KEY`:

Install `evalite`, `vitest`, and a scoring library like `autoevals`:

```bash
pnpm add -D evalite vitest autoevals
```

### 2. Add an `eval` script:

Add an `eval` script to your package.json:

```json
{
"scripts": {
"eval": "evalite"
}
}
```

### 3. Create your first eval:

Create `my-eval.eval.ts`:

```ts
// my-eval.eval.ts

import { evalite } from "evalite";
import { Levenshtein } from "autoevals";

evalite("My Eval", {
// A function that returns an array of test data
// - TODO: Replace with your test data
data: async () => {
return [{ input: "Hello", output: "Hello World!" }];
},
// The task to perform
// - TODO: Replace with your LLM call
task: async (input) => {
return input + " World!";
},
// The scoring methods for the eval
scorers: [Levenshtein],
});
```

> [!NOTE]
>
> `.eval.ts` is the extension Evalite looks for when scanning for evals.
### 4. Run Your Eval

Run `pnpm run eval`.

This runs `evalite`, which runs the evals:

- Runs the `data` function to get the test data
- Runs the `task` function on each test data
- Scores the output of the `task` function using the `scorers`
- Appends the result of the eval to a `evalite-report.jsonl` file

It then:

- Shows a UI for viewing the traces, scores, inputs and outputs at http://localhost:3006.
- If you only ran one eval, it also shows a table summarizing the eval in the terminal.

### 5. View Your Eval

Open http://localhost:3006 in your browser to view the results of the eval.

## Guides

### Watch Mode

You can run Evalite in watch mode by running `evalite watch`:

```bash
evalite watch
```

This will watch for changes to your `.eval.ts` files and re-run the evals when they change.

> [!IMPORTANT]
>
> I strongly recommend implementing a caching layer in your LLM calls when using watch mode. This will keep your evals running fast and avoid burning through your API credits.
### Environment Variables

To call your LLM from a third-party service, you'll likely need some environment variables to keep your API keys safe.

Since Evalite is based on Vitest, it should already pick them up from your `vite.config.ts`.

If you don't have Vitest set up, here's how to do it:

1. Create a `.env` file in the root of your project:

```
```sh
OPENAI_API_KEY=your-api-key
```

2. Add `.env` to your `.gitignore`, if it's not already there
2. Run `pnpm run dev`. This will:

```
.env
```

3. Install `dotenv`:

```bash
pnpm add -D dotenv
```

4. Add a `vite.config.ts` file:

```ts
// vite.config.ts

import { defineConfig } from "vite/config";

export default defineConfig({
test: {
setupFiles: ["dotenv/config"],
},
});
```

Now, your environment variables will be available in your evals.

### Scorers

Scorers are used to score the output of your LLM call.

[Autoevals](https://github.com/braintrustdata/autoevals) is a great library of scorers to get you started.

You can create your own using `createScorer`:

```ts
import { createScorer } from "evalite";

const containsParis = createScorer<string>({
name: "Contains Paris",
description: "Checks if the output contains the word 'Paris'.",
score: (output) => {
return output.includes("Paris") ? 1 : 0;
},
});

evalite("My Eval", {
data: async () => {
return [{ input: "Hello", output: "Hello World!" }];
},
task: async (input) => {
return input + " World!";
},
scorers: [containsParis],
});
```
- Run the TS type checker on `evalite`, `evalite-core`
- Run some tests at `evalite-tests`
- Run the UI dev server at http://localhost:5173
- Run `evalite watch` on the examples in `packages/example`

### Traces

Traces are used to track the behaviour of each individual call to an LLM inside your task.

You can report a trace by calling `reportTrace` inside an `evalite` eval:

```ts
import { evalite, type Evalite } from "evalite";
import { reportTrace } from "evalite/evals";

evalite("My Eval", {
data: async () => {
return [{ input: "Hello", output: "Hello World!" }];
},
task: async (input) => {
// Track the start time
const start = performance.now();

// Call our LLM
const result = await myLLMCall();

// Report the trace once it's finished
reportTrace({
start,
end: performance.now(),
output: result.output,
input: [
{
role: "user",
content: input,
},
],
usage: {
completionTokens: result.completionTokens,
promptTokens: result.promptTokens,
},
});

// Return the output
return result.output;
},
scorers: [Levenshtein],
});
```

> [!NOTE]
>
> `reportTrace` is a no-op in production, so you can leave it in your code without worrying about performance.
#### Reporting Traces Automatically

If you're using the [Vercel AI SDK](https://sdk.vercel.ai/docs/introduction), you can automatically report traces by wrapping your model in `traceAISDKModel` function:

```ts
import { traceAISDKModel } from "evalite/ai-sdk";
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";

// All calls to this model will be recorded in evalite!
const tracedModel = traceAISDKModel(openai("gpt-3.5-turbo"));

const result = await generateText({
model: tracedModel,
system: `Answer the question concisely.`,
prompt: `What is the capital of France?`,
});
```

> [!NOTE]
> [!IMPORTANT]
>
> `traceAISDKModel`, like `reportTrace`, is a no-op in production.
> You may need to run `pnpm build` in root, then `npm link` inside `packages/evalite` to get the global `evalite` command to work.

0 comments on commit b3b1b83

Please sign in to comment.