-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support multi-chunk extracts + prompt updates #89
Merged
Merged
Changes from all commits
Commits
Show all changes
47 commits
Select commit
Hold shift + click to select a range
9683e03
add simple google search eval
navidkpr 85ceeef
add 2 more evals
navidkpr c2c6e3a
make sure extract continues to use the same model on repeated call
navidkpr 7805bc1
add twitter sign up eval case
navidkpr 0113aaf
update eval
navidkpr 8a2afc5
Merge remote-tracking branch 'origin' into npour/first-eval
navidkpr eb3a864
add basic banalayzer eval system
navidkpr 5fbafd4
add server
navidkpr 9a5a571
update package jsons
navidkpr 0319827
clean up the files
navidkpr a6930a0
clean up
navidkpr bdb6d29
fix the bananalyzer eval system + add it to the main eval script
navidkpr e278483
remove all public files on server exit
navidkpr eccb0e8
fix the package.json playwright issue
navidkpr 9c5e731
clean up logs
navidkpr 073e605
remove .vscode
navidkpr d7ccb0e
cleanup
navidkpr bedb996
Merge remote-tracking branch 'origin' into npour/more-evals
navidkpr fcdbf49
move the test evals to the playground script
navidkpr ddf3cf7
cleanup
navidkpr 2aeac41
cleanup
navidkpr 85a00b0
add server/public to gitignore
navidkpr f154ca2
test -> playround (much better name)
navidkpr 884d942
fix the resource deletion issue
navidkpr 0d5289d
update readme + cleanup
navidkpr 2f54d16
cleanup of readme
navidkpr 82b309e
remove the changes in teh lib folder
navidkpr 042acd7
cleanup readme
navidkpr 2c6b8cf
cleanup
navidkpr 5701f76
cleanup
navidkpr d097939
update readme
navidkpr 33e3f2a
make top element look if the element is top in multiple points in the…
navidkpr c3e20a5
Merge remote-tracking branch 'origin' into npour/fix-bananalyzer-1
navidkpr f0be458
switch to my repo (so we can edit the examples when they don't make s…
navidkpr 1757233
fix bug: now we properly support multi-chunk extracts
navidkpr b9cd0e6
add more information into the eval outputs
navidkpr d7decc5
Merge remote-tracking branch 'origin' into npour/fix-bananalyzer-3
navidkpr 3d9edcb
Merge branch 'npour/more-info-in-eval' into npour/fix-bananalyzer-3
navidkpr a87a8f9
fix issues with bananalyzer 2 + stabalize github test cases
navidkpr a244604
Merge remote-tracking branch 'origin' into npour/fix-bananalyzer-3
navidkpr 61c9be7
add homedepot task case to evals
navidkpr 3332fd6
update error output
navidkpr ade328c
fix more eval cases
navidkpr 4d28507
cleanup
navidkpr 00a23dc
emulate a full browser better
navidkpr 594596e
use true home depot eval
pkiv 072d499
Merge branch 'main' into npour/fix-bananalyzer-3
pkiv File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,7 +9,7 @@ You are given: | |
2. the steps that have been taken so far | ||
3. a list of active DOM elements in this chunk to consider to accomplish the goal. | ||
|
||
You have 2 tools that you can call: doAction, and skipSection | ||
You have 2 tools that you can call: doAction, and skipSection. Do action only performs Playwright actions. Do not perform any other actions. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hope this helps with me previously writing edge cases |
||
`; | ||
|
||
export function buildActSystemPrompt(): OpenAI.ChatCompletionMessageParam { | ||
|
@@ -104,10 +104,7 @@ export const actTools: Array<OpenAI.ChatCompletionTool> = [ | |
]; | ||
|
||
// extract | ||
const extractSystemPrompt = ` | ||
'you are extracting content on behalf of a user. You will be given an instruction, progress so far, and a list of DOM elements to extract from', | ||
|
||
`; | ||
const extractSystemPrompt = `you are extracting content on behalf of a user. You will be given an instruction, progress so far, and a list of DOM elements to extract from. Where applicable, return the exact text from the DOM elements with all symbols, characters and endlines as is. Only extract new information that has not already been extracted. Make sure you include the extraction in your response. Return null or an empty string if no new information is found for a string variable`; | ||
|
||
export function buildExtractSystemPrompt(): OpenAI.ChatCompletionMessageParam { | ||
const content = extractSystemPrompt.replace(/\s+/g, " "); | ||
|
@@ -120,12 +117,18 @@ export function buildExtractSystemPrompt(): OpenAI.ChatCompletionMessageParam { | |
export function buildExtractUserPrompt( | ||
instruction: string, | ||
progress: string, | ||
previouslyExtractedContent: object, | ||
domElements: string, | ||
): OpenAI.ChatCompletionMessageParam { | ||
return { | ||
role: "user", | ||
content: `instruction: ${instruction} | ||
progress: ${progress} | ||
Previously Extracted Content:\n${JSON.stringify( | ||
previouslyExtractedContent, | ||
null, | ||
2, | ||
)} | ||
DOM: ${domElements}`, | ||
}; | ||
} | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might help with the Costar eval which does not work in headless mode!