Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support multi-chunk extracts + prompt updates #89

Merged
merged 47 commits into from
Oct 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
9683e03
add simple google search eval
navidkpr Sep 22, 2024
85ceeef
add 2 more evals
navidkpr Sep 22, 2024
c2c6e3a
make sure extract continues to use the same model on repeated call
navidkpr Sep 22, 2024
7805bc1
add twitter sign up eval case
navidkpr Sep 22, 2024
0113aaf
update eval
navidkpr Sep 22, 2024
8a2afc5
Merge remote-tracking branch 'origin' into npour/first-eval
navidkpr Sep 24, 2024
eb3a864
add basic banalayzer eval system
navidkpr Sep 24, 2024
5fbafd4
add server
navidkpr Sep 24, 2024
9a5a571
update package jsons
navidkpr Sep 24, 2024
0319827
clean up the files
navidkpr Sep 24, 2024
a6930a0
clean up
navidkpr Sep 24, 2024
bdb6d29
fix the bananalyzer eval system + add it to the main eval script
navidkpr Sep 30, 2024
e278483
remove all public files on server exit
navidkpr Sep 30, 2024
eccb0e8
fix the package.json playwright issue
navidkpr Sep 30, 2024
9c5e731
clean up logs
navidkpr Sep 30, 2024
073e605
remove .vscode
navidkpr Sep 30, 2024
d7ccb0e
cleanup
navidkpr Sep 30, 2024
bedb996
Merge remote-tracking branch 'origin' into npour/more-evals
navidkpr Sep 30, 2024
fcdbf49
move the test evals to the playground script
navidkpr Sep 30, 2024
ddf3cf7
cleanup
navidkpr Sep 30, 2024
2aeac41
cleanup
navidkpr Sep 30, 2024
85a00b0
add server/public to gitignore
navidkpr Sep 30, 2024
f154ca2
test -> playround (much better name)
navidkpr Sep 30, 2024
884d942
fix the resource deletion issue
navidkpr Sep 30, 2024
0d5289d
update readme + cleanup
navidkpr Sep 30, 2024
2f54d16
cleanup of readme
navidkpr Sep 30, 2024
82b309e
remove the changes in teh lib folder
navidkpr Sep 30, 2024
042acd7
cleanup readme
navidkpr Sep 30, 2024
2c6b8cf
cleanup
navidkpr Sep 30, 2024
5701f76
cleanup
navidkpr Sep 30, 2024
d097939
update readme
navidkpr Sep 30, 2024
33e3f2a
make top element look if the element is top in multiple points in the…
navidkpr Oct 1, 2024
c3e20a5
Merge remote-tracking branch 'origin' into npour/fix-bananalyzer-1
navidkpr Oct 1, 2024
f0be458
switch to my repo (so we can edit the examples when they don't make s…
navidkpr Oct 2, 2024
1757233
fix bug: now we properly support multi-chunk extracts
navidkpr Oct 3, 2024
b9cd0e6
add more information into the eval outputs
navidkpr Oct 3, 2024
d7decc5
Merge remote-tracking branch 'origin' into npour/fix-bananalyzer-3
navidkpr Oct 3, 2024
3d9edcb
Merge branch 'npour/more-info-in-eval' into npour/fix-bananalyzer-3
navidkpr Oct 3, 2024
a87a8f9
fix issues with bananalyzer 2 + stabalize github test cases
navidkpr Oct 3, 2024
a244604
Merge remote-tracking branch 'origin' into npour/fix-bananalyzer-3
navidkpr Oct 3, 2024
61c9be7
add homedepot task case to evals
navidkpr Oct 3, 2024
3332fd6
update error output
navidkpr Oct 3, 2024
ade328c
fix more eval cases
navidkpr Oct 3, 2024
4d28507
cleanup
navidkpr Oct 3, 2024
00a23dc
emulate a full browser better
navidkpr Oct 3, 2024
594596e
use true home depot eval
pkiv Oct 4, 2024
072d499
Merge branch 'main' into npour/fix-bananalyzer-3
pkiv Oct 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 77 additions & 2 deletions evals/index.eval.ts
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,81 @@ const peeler_complex = async () => {
};
};

const homedepot = async () => {
const stagehand = new Stagehand({
env,
verbose: 1,
headless: process.env.HEADLESS !== "false",
});
await stagehand.init();

try {
await stagehand.page.goto("https://www.homedepot.com/");
await stagehand.waitForSettledDom();

await stagehand.act({ action: "search for gas grills" });
await stagehand.waitForSettledDom();

await stagehand.act({ action: "click on the best selling gas grill" });
await stagehand.waitForSettledDom();

await stagehand.act({ action: "click on the Product Details" });
await stagehand.waitForSettledDom();

await stagehand.act({ action: "find the Primary Burner BTU" });
await stagehand.waitForSettledDom();

const productSpecs = await stagehand.extract({
instruction: "Extract the Primary exact Burner BTU of the product",
schema: z.object({
productSpecs: z
.array(
z.object({
burnerBTU: z.string().describe("Primary Burner BTU exact value"),
}),
)
.describe("Gas grill Primary Burner BTU exact value"),
}),
modelName: "gpt-4o-2024-08-06",
});
console.log("The gas grill primary burner BTU is:", productSpecs);

if (
!productSpecs ||
!productSpecs.productSpecs ||
productSpecs.productSpecs.length !== 1
) {
return {
_success: false,
productSpecs,
};
}

if (
(productSpecs.productSpecs[0].burnerBTU.match(/0/g) || []).length == 4 &&
(productSpecs.productSpecs[0].burnerBTU.match(/4/g) || []).length === 1
) {
return {
_success: true,
productSpecs,
};
} else {
return {
_success: false,
productSpecs,
};
}
} catch (error) {
console.error(`Error in homedepot function: ${error.message}`);
return {
_success: false,
error: JSON.parse(JSON.stringify(error, null, 2)),
};
} finally {
await stagehand.context.close();
}
};

const extract_collaborators_from_github_repository = async () => {
const stagehand = new Stagehand({
env: "LOCAL",
Expand Down Expand Up @@ -459,8 +534,7 @@ const tasks = {
extract_last_twenty_github_commits,
costar,
google_jobs,
homedepot,
nonsense_action
homedepot
};

const exactMatch = (args: { input: any; output: any; expected?: any }) => {
Expand Down Expand Up @@ -509,6 +583,7 @@ const testcases = [
{ input: { name: "extract_last_twenty_github_commits" } },
// { input: { name: "costar", expected: true } },
{ input: { name: "google_jobs" } },
{ input: { name: "homedepot" } },
...chosenBananalyzerEvals.map((evalItem: any) => ({
input: {
name: evalItem.name,
Expand Down
1 change: 0 additions & 1 deletion evals/playground.ts
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,6 @@ const homedepot = async () => {
}
};


async function main() {
const homedepotResult = await homedepot();

Expand Down
63 changes: 59 additions & 4 deletions lib/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -52,14 +52,64 @@ async function getBrowser(env: "LOCAL" | "BROWSERBASE" = "LOCAL", headless: bool
width: 1250,
height: 800,
},
}
locale: "en-US",
timezoneId: "America/New_York",
deviceScaleFactor: 1,
args: [
"--enable-webgl",
"--use-gl=swiftshader",
"--enable-accelerated-2d-canvas",
],
excludeSwitches: "enable-automation",
userDataDir: "./user_data",
},
);

console.log("Local browser started successfully.");

await applyStealthScripts(context);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might help with the Costar eval which does not work in headless mode!


return { context };
}
}

async function applyStealthScripts(context: BrowserContext) {
await context.addInitScript(() => {
// Override the navigator.webdriver property
Object.defineProperty(navigator, "webdriver", {
get: () => undefined,
});

// Mock languages and plugins to mimic a real browser
Object.defineProperty(navigator, "languages", {
get: () => ["en-US", "en"],
});

Object.defineProperty(navigator, "plugins", {
get: () => [1, 2, 3, 4, 5],
});

// Remove Playwright-specific properties
delete (window as any).__playwright;
delete (window as any).__pw_manual;
delete (window as any).__PW_inspect;

// Redefine the headless property
Object.defineProperty(navigator, "headless", {
get: () => false,
});

// Override the permissions API
const originalQuery = window.navigator.permissions.query;
window.navigator.permissions.query = (parameters: any) =>
parameters.name === "notifications"
? Promise.resolve({
state: Notification.permission,
} as PermissionStatus)
: originalQuery(parameters);
});
}

export class Stagehand {
private llmProvider: LLMProvider;
public observations: {
Expand Down Expand Up @@ -219,8 +269,9 @@ export class Stagehand {

await this.waitForSettledDom();
await this.startDomDebug();
const { outputString, chunk, chunks } = await this.page.evaluate(() =>
window.processDom([])
const { outputString, chunk, chunks } = await this.page.evaluate(
(chunksSeen?: number[]) => window.processDom(chunksSeen ?? []),
chunksSeen,
);
this.log({
category: "extraction",
Expand All @@ -231,12 +282,16 @@ export class Stagehand {
const extractionResponse = await extract({
instruction,
progress,
previouslyExtractedContent: content,
domElements: outputString,
llmProvider: this.llmProvider,
schema,
modelName: modelName || this.defaultModelName,
});
const { progress: newProgress, completed, ...output } = extractionResponse;
const {
metadata: { progress: newProgress, completed },
...output
} = extractionResponse;
await this.cleanupDomDebug();

this.log({
Expand Down
24 changes: 19 additions & 5 deletions lib/inference.ts
Original file line number Diff line number Diff line change
Expand Up @@ -62,30 +62,41 @@ export async function act({
export async function extract({
instruction,
progress,
previouslyExtractedContent,
domElements,
schema,
llmProvider,
modelName,
}: {
instruction: string;
progress: string;
previouslyExtractedContent: any;
domElements: string;
schema: z.ZodObject<any>;
llmProvider: LLMProvider;
modelName: string;
}) {
const llmClient = llmProvider.getClient(modelName);

const fullSchema = schema.extend({
progress: z.string().describe("progress of what has been extracted so far"),
completed: z.boolean().describe("true if the goal is now accomplished"),
metadata: z.object({
progress: z
.string()
.describe("progress of what has been extracted so far"),
completed: z.boolean().describe("true if the goal is now accomplished"),
}),
});

return llmClient.createExtraction({
model: modelName,
messages: [
buildExtractSystemPrompt() as ChatMessage,
buildExtractUserPrompt(instruction, progress, domElements) as ChatMessage,
buildExtractUserPrompt(
instruction,
progress,
previouslyExtractedContent,
domElements,
) as ChatMessage,
],
response_model: {
schema: fullSchema,
Expand Down Expand Up @@ -143,7 +154,10 @@ export async function ask({
const llmClient = llmProvider.getClient(modelName);
const response = await llmClient.createChatCompletion({
model: modelName,
messages: [buildAskSystemPrompt() as ChatMessage, buildAskUserPrompt(question) as ChatMessage],
messages: [
buildAskSystemPrompt() as ChatMessage,
buildAskUserPrompt(question) as ChatMessage,
],
temperature: 0.1,
top_p: 1,
frequency_penalty: 0,
Expand Down
13 changes: 8 additions & 5 deletions lib/prompt.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ You are given:
2. the steps that have been taken so far
3. a list of active DOM elements in this chunk to consider to accomplish the goal.

You have 2 tools that you can call: doAction, and skipSection
You have 2 tools that you can call: doAction, and skipSection. Do action only performs Playwright actions. Do not perform any other actions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope this helps with me previously writing edge cases

`;

export function buildActSystemPrompt(): OpenAI.ChatCompletionMessageParam {
Expand Down Expand Up @@ -104,10 +104,7 @@ export const actTools: Array<OpenAI.ChatCompletionTool> = [
];

// extract
const extractSystemPrompt = `
'you are extracting content on behalf of a user. You will be given an instruction, progress so far, and a list of DOM elements to extract from',

`;
const extractSystemPrompt = `you are extracting content on behalf of a user. You will be given an instruction, progress so far, and a list of DOM elements to extract from. Where applicable, return the exact text from the DOM elements with all symbols, characters and endlines as is. Only extract new information that has not already been extracted. Make sure you include the extraction in your response. Return null or an empty string if no new information is found for a string variable`;

export function buildExtractSystemPrompt(): OpenAI.ChatCompletionMessageParam {
const content = extractSystemPrompt.replace(/\s+/g, " ");
Expand All @@ -120,12 +117,18 @@ export function buildExtractSystemPrompt(): OpenAI.ChatCompletionMessageParam {
export function buildExtractUserPrompt(
instruction: string,
progress: string,
previouslyExtractedContent: object,
domElements: string,
): OpenAI.ChatCompletionMessageParam {
return {
role: "user",
content: `instruction: ${instruction}
progress: ${progress}
Previously Extracted Content:\n${JSON.stringify(
previouslyExtractedContent,
null,
2,
)}
DOM: ${domElements}`,
};
}
Expand Down