-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add datetime-enabled filters #295
Conversation
## Changes Made 1. **Expanded Type Support**: - Updated return type signatures across all vectorizers to properly reflect the ability to return either data lists (`List[float]`) or binary buffers (`bytes`) - Added special handling for Cohere's integer embedding types (`List[int]`) 2. **Standardized Interface**: - Uniform type annotations and docstrings across all vectorizer implementations - Consistent default batch sizes (10) for better predictability 3. **Improved Provider-Specific Support**: - Enhanced kwargs forwarding to allow passing provider-specific parameters - Better warnings for deprecated parameters (like Cohere's `embedding_types`) 4. **Fixed Type Checking**: - Added strategic type ignores to resolve MyPy errors - Made minimal changes to consumer code to handle the expanded return types ## Motivation These changes create a more consistent and flexible vectorizer interface that: - Accurately represents what the methods can return - Accommodates provider-specific features (like Cohere's integer embeddings) - Provides clearer documentation for users - Maintains backward compatibility ## Future Improvements For future consideration: - Introduce helper methods (like `embed_as_list()`) that guarantee specific return types when needed - Add more robust type conversion in consumer code that relies on specific types - Develop a cleaner separation between the base vectorizer interface and provider-specific extensions - Consider a more structured approach to provider-specific parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Left a few notes on docstrings and stdouts. Good from me once addressed!
assert str(ts) == "*" | ||
|
||
|
||
def test_timestamp_invalid_input(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a similar theme, what happens when the Timestamp
field is called / created for a field that isn't numeric in the index? How does that behave? Given there is a requirement that Timestamp
filters are tied to num
fields, thinking through how we should document that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great question. Currently if you tried to do this with a text field, for example, it would let you create the filter but then fail on type when trying to perform the search itself. This would provide a hint to the user but is not perfect.
I think it might be worth an enhancement to add a function that checks this ahead of time for all filter types against schema. Because right now I don't think we provide a meaningful error for a text field on a numeric either.
@abrookins thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small issue here:
11:03:03 [RedisVL] INFO Indices:
11:03:03 [RedisVL] INFO 1. float64_session
11:03:03 [RedisVL] INFO 2. float64_cache
11:03:03 [RedisVL] INFO 3. float16_cache
11:03:03 [RedisVL] INFO 4. float32_session
11:03:03 [RedisVL] INFO 5. float16_session
11:03:03 [RedisVL] INFO 6. bfloat_session
11:03:03 [RedisVL] INFO 7. float32_cache
11:03:03 [RedisVL] INFO 8. bfloat_cache
11:03:03 [RedisVL] INFO 9. user_queries
This is collecting indices from your local instance that weren't generated by the notebook (flush the db before running the notebook and recommit)
Ahh one more thing: new class so we need to add to our docs. Also need to make sure to open these PRs against |
Goal:
Create a new DatetimeFilter or TimestampFilter
Alternatively, create a new Timestamp field type that allows specifying via YAML or dictionary that a numeric field is actually a timestamp, with or without a timezone.