-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[ENH] add query config on collection configuration, splade, and bm25 efs #4901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
2d66a17 to
69696fd
Compare
a339c38 to
34225f8
Compare
7eac9c8 to
bd49feb
Compare
36a2766 to
0a6cc75
Compare
0a6cc75 to
a219bb3
Compare
a219bb3 to
e67eb4f
Compare
5f002bb to
5100775
Compare
| self, | ||
| record_set: BaseRecordSet, | ||
| embeddable_fields: Optional[Set[str]] = None, | ||
| is_query: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this works, but just for discussion an alternative approach is to have separate methods for read and write paths. any idea here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah +1
| if is_query: | ||
| return self._embedding_function.embed_query(input=input) | ||
| else: | ||
| return self._embedding_function(input=input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work for all embedding functions we support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since theres a default that assumes query config doesnt exist, none of the existing efs will break.
Sicheng-Pan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
5100775 to
5edc33f
Compare
|
Add Query Config for Embedding Functions, Introduce Sparse Embedding Support This PR introduces major extensibility to the embedding function API by adding a standardized Key Changes• Added Affected Areas• This summary was automatically generated by @propel-code-bot |
5edc33f to
341a219
Compare
| class FastembedSparseEmbeddingFunction(SparseEmbeddingFunction[Documents]): | ||
| def __init__( | ||
| self, | ||
| model_name: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe put a list of models for common use cases somewhere
341a219 to
6c083cf
Compare
6c083cf to
255e7f2
Compare

Description of changes
This PR adds a new function embed_query on the base embeddingfunction type to allow embedding functions to define a secondary path to embed documents for query path. by default this will invoke the call method.
this also adds a new protocol for SparseEmbeddingFunctions, and 2 new embedding functions: huggingface_sparse_embedding_function and fastembed_sparse_embedding_function
Test plan
How are these changes tested?
pytestfor python,yarn testfor js,cargo testfor rustDocumentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?