-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add image and audio prompting API #71
base: main
Are you sure you want to change the base?
Conversation
|
||
const response1 = await session.prompt([ | ||
"Give a helpful artistic critique of how well the second image matches the first:", | ||
{ type: "image", data: referenceImage }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some models may only accept a single image or single audio input with each request. Consider describing that edge-case detail behavior (e.g. throw a "NotSupportedError"
DOMException
) when an unsupported number or combination of image/audio prompt pieces are passed. Maybe also give a single-image example here for wider compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should encapsulate that away from the user so they don't have to worry about it, by sending two requests to the backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think number of images aspect of this can be handled through the contextoverflow event. https://github.com/webmachinelearning/prompt-api#tokenization-context-window-length-limits-and-overflow.
Phi 3.5 vision https://huggingface.co/microsoft/Phi-3.5-vision-instruct, recommends at the most 16 images but passing more (say 17) will only run into context length limitations. The context length limitation can be reached earlier as well with large image.
Sending two requests to the backend may not work though, the assistant is going to add its response in between.
Are there other limitations in terms of no mixing of images/audio or limitations in number of images/audio. Ill have to check what the server-side models do for their API.
|
||
// Initial prompt lines | ||
|
||
dictionary AILanguageModelInitialPromptLineDict { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worthwhile to have separate role enums and dictionaries for initial/not lines?
Would simply throwing exceptions for invalid 'system' role use in the impl suffice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize you've explored this thoroughly, but I wonder why the idl couldn't be simpler, like:
dictionary AILanguageModelPromptDict {
required AILanguageModelPromptRole role = "user";
required AILanguageModelPromptType type = "text";
required AILanguageModelPromptData data;
}
typedef (DOMString or AILanguageModelPromptDict) AILanguageModelPrompt;
typedef (AILanguageModelPrompt or sequence<AILanguageModelPrompt>) AILanguageModelPromptInput;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worthwhile to have separate role enums and dictionaries for initial/not lines?
Would simply throwing exceptions for invalid 'system' role use in the impl suffice?
Yeah that's totally valid. If you as an implementer think that's simpler, we can do it.
I realize you've explored this thoroughly, but I wonder why the idl couldn't be simpler, like:
(Note that required and default values are not compatible.)
So, your version is mainly about unnesting the content, right?
I think it mostly works, it just departs from what's idiomatic with other chat completion APIs. Hmm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I imagined unnesting might slightly simplify IDL, impl, and atypical uses specifying more fields.
create({initialPrompts: [{role: "system", content: {type: "image", data: pixels}}]})
vs
create({initialPrompts: [{role: "system", type: "image", data: pixels}]})
I'm unfamiliar with rationale behind other APIs' idiomatic designs, so I'd readily defer to folks that know more, and maybe even opt for the more familiar pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth considering, since we're not matching the APIs exactly in other ways. I will try to outline the two possibilities in more detail to get developer feedback in #40, on Monday.
Add IDL for new LanguageModel[Factory] API types, etc., per: webmachinelearning/prompt-api#71 API use with currently supported input types is unchanged. API use with new input types throws TypeErrors for now. Move create() WPTs into a new file as separate tests. Bug: 385173789, 385173368 Change-Id: Id8ca1c8410f1a97bb7d28b4bc568020ff0412698 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6216647 Reviewed-by: Clark DuVall <[email protected]> Commit-Queue: Mike Wasserman <[email protected]> Cr-Commit-Position: refs/heads/main@{#1414456}
This reverts commit 259d706. Reason for revert: Failing win builder: https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket/8724180044051350113/+/u/compile/raw_io.output_text_failure_summary_ Original change's description: > Prompt API: Add multimodal input IDL skeleton > > Add IDL for new LanguageModel[Factory] API types, etc., per: > webmachinelearning/prompt-api#71 > > API use with currently supported input types is unchanged. > API use with new input types throws TypeErrors for now. > > Move create() WPTs into a new file as separate tests. > > Bug: 385173789, 385173368 > Change-Id: Id8ca1c8410f1a97bb7d28b4bc568020ff0412698 > Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6216647 > Reviewed-by: Clark DuVall <[email protected]> > Commit-Queue: Mike Wasserman <[email protected]> > Cr-Commit-Position: refs/heads/main@{#1414456} Bug: 385173789, 385173368 Change-Id: If058bf22fceb6880b7ba12ebccbdca74a0274535 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6221804 Commit-Queue: Mike Wasserman <[email protected]> Auto-Submit: Mike Wasserman <[email protected]> Reviewed-by: Clark DuVall <[email protected]> Cr-Commit-Position: refs/heads/main@{#1414477}
This reverts commit 77f9f49. Reason for revert: Workaround Windows IDL compiler path length issues Original change's description: > Revert "Prompt API: Add multimodal input IDL skeleton" > > This reverts commit 259d706. > > Reason for revert: Failing win builder: > https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket/8724180044051350113/+/u/compile/raw_io.output_text_failure_summary_ > > Original change's description: > > Prompt API: Add multimodal input IDL skeleton > > > > Add IDL for new LanguageModel[Factory] API types, etc., per: > > webmachinelearning/prompt-api#71 > > > > API use with currently supported input types is unchanged. > > API use with new input types throws TypeErrors for now. > > > > Move create() WPTs into a new file as separate tests. > > > > Bug: 385173789, 385173368 > > Change-Id: Id8ca1c8410f1a97bb7d28b4bc568020ff0412698 > > Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6216647 > > Reviewed-by: Clark DuVall <[email protected]> > > Commit-Queue: Mike Wasserman <[email protected]> > > Cr-Commit-Position: refs/heads/main@{#1414456} > > Bug: 385173789, 385173368 > Change-Id: If058bf22fceb6880b7ba12ebccbdca74a0274535 > No-Presubmit: true > No-Tree-Checks: true > No-Try: true > Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6221804 > Commit-Queue: Mike Wasserman <[email protected]> > Auto-Submit: Mike Wasserman <[email protected]> > Reviewed-by: Clark DuVall <[email protected]> > Cr-Commit-Position: refs/heads/main@{#1414477} Bug: 385173789, 385173368, 394123703 Change-Id: I7a47d8ff8c6b6ae797c3198608859640ae81e1df Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6222573 Reviewed-by: Clark DuVall <[email protected]> Commit-Queue: Mike Wasserman <[email protected]> Cr-Commit-Position: refs/heads/main@{#1415399}
Closes #40. Somewhat helps with #70.