Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use browserbase in github action eval #84

Merged
merged 36 commits into from
Oct 3, 2024
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
9683e03
add simple google search eval
navidkpr Sep 22, 2024
85ceeef
add 2 more evals
navidkpr Sep 22, 2024
c2c6e3a
make sure extract continues to use the same model on repeated call
navidkpr Sep 22, 2024
7805bc1
add twitter sign up eval case
navidkpr Sep 22, 2024
0113aaf
update eval
navidkpr Sep 22, 2024
8a2afc5
Merge remote-tracking branch 'origin' into npour/first-eval
navidkpr Sep 24, 2024
eb3a864
add basic banalayzer eval system
navidkpr Sep 24, 2024
5fbafd4
add server
navidkpr Sep 24, 2024
9a5a571
update package jsons
navidkpr Sep 24, 2024
0319827
clean up the files
navidkpr Sep 24, 2024
a6930a0
clean up
navidkpr Sep 24, 2024
bdb6d29
fix the bananalyzer eval system + add it to the main eval script
navidkpr Sep 30, 2024
e278483
remove all public files on server exit
navidkpr Sep 30, 2024
eccb0e8
fix the package.json playwright issue
navidkpr Sep 30, 2024
9c5e731
clean up logs
navidkpr Sep 30, 2024
073e605
remove .vscode
navidkpr Sep 30, 2024
d7ccb0e
cleanup
navidkpr Sep 30, 2024
bedb996
Merge remote-tracking branch 'origin' into npour/more-evals
navidkpr Sep 30, 2024
fcdbf49
move the test evals to the playground script
navidkpr Sep 30, 2024
ddf3cf7
cleanup
navidkpr Sep 30, 2024
2aeac41
cleanup
navidkpr Sep 30, 2024
85a00b0
add server/public to gitignore
navidkpr Sep 30, 2024
f154ca2
test -> playround (much better name)
navidkpr Sep 30, 2024
884d942
fix the resource deletion issue
navidkpr Sep 30, 2024
0d5289d
update readme + cleanup
navidkpr Sep 30, 2024
2f54d16
cleanup of readme
navidkpr Sep 30, 2024
82b309e
remove the changes in teh lib folder
navidkpr Sep 30, 2024
042acd7
cleanup readme
navidkpr Sep 30, 2024
2c6b8cf
cleanup
navidkpr Sep 30, 2024
5701f76
cleanup
navidkpr Sep 30, 2024
d097939
update readme
navidkpr Sep 30, 2024
5f977eb
use browserbase browser in github action eval
navidkpr Sep 30, 2024
c857549
Merge remote-tracking branch 'origin' into npour/use-browserbase-in-g…
navidkpr Oct 3, 2024
5d384ce
force eval env to browserbase for github action
navidkpr Oct 3, 2024
8b9727a
Merge remote-tracking branch 'origin' into npour/use-browserbase-in-g…
navidkpr Oct 3, 2024
c7929be
Set peeler env to local
navidkpr Oct 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
node-version: "20"

- name: Install pnpm
run: npm install -g pnpm
Expand All @@ -33,6 +33,8 @@ jobs:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
BRAINTRUST_API_KEY: ${{ secrets.BRAINTRUST_API_KEY }}
BROWSERBASE_API_KEY: ${{ secrets.BROWSERBASE_API_KEY }}
HEADLESS: true
EVAL_ENV: browserbase
run: pnpm evals
timeout-minutes: 20
34 changes: 22 additions & 12 deletions evals/index.eval.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,14 @@ import { evaluateExample, chosenBananalyzerEvals } from "./bananalyzer-ts";
import { createExpressServer } from "./bananalyzer-ts/server/expressServer";
import process from "process";

const env =
process.env.EVAL_ENV?.toLowerCase() === "browserbase"
? "BROWSERBASE"
: "LOCAL";

const vanta = async () => {
const stagehand = new Stagehand({
env: "LOCAL",
env,
headless: process.env.HEADLESS !== "false",
});
await stagehand.init();
Expand All @@ -17,7 +22,10 @@ const vanta = async () => {

const observation = await stagehand.observe("find the request demo button");

if (!observation) return false;
if (!observation) {
await stagehand.context.close();
return false;
}

const observationResult = await stagehand.page
.locator(stagehand.observations[observation].result)
Expand All @@ -38,7 +46,7 @@ const vanta = async () => {

const vanta_h = async () => {
const stagehand = new Stagehand({
env: "LOCAL",
env,
headless: process.env.HEADLESS !== "false",
});
await stagehand.init();
Expand All @@ -56,7 +64,7 @@ const vanta_h = async () => {

const simple_google_search = async () => {
const stagehand = new Stagehand({
env: "LOCAL",
env,
headless: process.env.HEADLESS !== "false",
});
await stagehand.init();
Expand All @@ -69,14 +77,15 @@ const simple_google_search = async () => {

const expectedUrl = "https://www.google.com/search?q=OpenAI";
const currentUrl = await stagehand.page.url();

await stagehand.context.close();

return currentUrl.startsWith(expectedUrl);
};

const peeler_simple = async () => {
const stagehand = new Stagehand({
env: "LOCAL",
env,
headless: process.env.HEADLESS !== "false",
});
await stagehand.init();
Expand All @@ -97,7 +106,7 @@ const peeler_simple = async () => {

const peeler_complex = async () => {
const stagehand = new Stagehand({
env: "LOCAL",
env,
verbose: 1,
headless: process.env.HEADLESS !== "false",
});
Expand All @@ -123,8 +132,9 @@ const peeler_complex = async () => {

return price !== null;
};

const extract_collaborators_from_github_repository = async () => {
const stagehand = new Stagehand({ env: "LOCAL", verbose: 1 });
const stagehand = new Stagehand({ env, verbose: 1 });
await stagehand.init();

const timeoutPromise = new Promise((_, reject) =>
Expand Down Expand Up @@ -168,7 +178,7 @@ const extract_collaborators_from_github_repository = async () => {
};

const extract_last_twenty_github_commits = async () => {
const stagehand = new Stagehand({ env: "LOCAL", verbose: 1 });
const stagehand = new Stagehand({ env, verbose: 1 });
await stagehand.init();

const timeoutPromise = new Promise((_, reject) =>
Expand Down Expand Up @@ -210,7 +220,7 @@ const extract_last_twenty_github_commits = async () => {
};

const twitter_signup = async () => {
const stagehand = new Stagehand({ env: "LOCAL", verbose: 1 });
const stagehand = new Stagehand({ env, verbose: 1 });
await stagehand.init();

const timeoutPromise = new Promise((_, reject) =>
Expand Down Expand Up @@ -246,7 +256,7 @@ const twitter_signup = async () => {

const wikipedia = async () => {
const stagehand = new Stagehand({
env: "LOCAL",
env,
verbose: 2,
headless: process.env.HEADLESS !== "false",
});
Expand All @@ -266,7 +276,7 @@ const wikipedia = async () => {

const costar = async () => {
const stagehand = new Stagehand({
env: "LOCAL",
env,
verbose: 2,
debugDom: true,
headless: process.env.HEADLESS !== "false",
Expand Down Expand Up @@ -314,7 +324,7 @@ const costar = async () => {

const google_jobs = async () => {
const stagehand = new Stagehand({
env: "LOCAL",
env,
verbose: 2,
debugDom: true,
headless: process.env.HEADLESS !== "false",
Expand Down