Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add datetime-enabled filters #295

Merged
merged 9 commits into from
Mar 21, 2025
Merged

Add datetime-enabled filters #295

merged 9 commits into from
Mar 21, 2025

Conversation

rbs333
Copy link
Collaborator

@rbs333 rbs333 commented Mar 18, 2025

Goal:

  1. Create a new DatetimeFilter or TimestampFilter

    • Allow querying for a specific date without time
    • Allow querying for a specific date with time
    • Allow querying for a date range
    • Allow querying for a time range
    • Allow querying with or without a timezone
    • Default to timezone-aware UTC datetimes
  2. Alternatively, create a new Timestamp field type that allows specifying via YAML or dictionary that a numeric field is actually a timestamp, with or without a timezone.

tylerhutcherson and others added 4 commits March 4, 2025 20:42

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
## Changes Made

1. **Expanded Type Support**: 
- Updated return type signatures across all vectorizers to properly
reflect the ability to return either data lists (`List[float]`) or
binary buffers (`bytes`)
- Added special handling for Cohere's integer embedding types
(`List[int]`)

2. **Standardized Interface**:
- Uniform type annotations and docstrings across all vectorizer
implementations
   - Consistent default batch sizes (10) for better predictability

3. **Improved Provider-Specific Support**:
- Enhanced kwargs forwarding to allow passing provider-specific
parameters
- Better warnings for deprecated parameters (like Cohere's
`embedding_types`)

4. **Fixed Type Checking**:
   - Added strategic type ignores to resolve MyPy errors
- Made minimal changes to consumer code to handle the expanded return
types

## Motivation

These changes create a more consistent and flexible vectorizer interface
that:
- Accurately represents what the methods can return
- Accommodates provider-specific features (like Cohere's integer
embeddings)
- Provides clearer documentation for users
- Maintains backward compatibility

## Future Improvements

For future consideration:
- Introduce helper methods (like `embed_as_list()`) that guarantee
specific return types when needed
- Add more robust type conversion in consumer code that relies on
specific types
- Develop a cleaner separation between the base vectorizer interface and
provider-specific extensions
- Consider a more structured approach to provider-specific parameters
rbs333 added 4 commits March 18, 2025 12:37
@rbs333 rbs333 marked this pull request as ready for review March 19, 2025 14:23
@tylerhutcherson tylerhutcherson changed the title Feat/raae 202/timestamp Add datetime-enabled filters Mar 19, 2025
Copy link
Collaborator

@tylerhutcherson tylerhutcherson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Left a few notes on docstrings and stdouts. Good from me once addressed!

assert str(ts) == "*"


def test_timestamp_invalid_input():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a similar theme, what happens when the Timestamp field is called / created for a field that isn't numeric in the index? How does that behave? Given there is a requirement that Timestamp filters are tied to num fields, thinking through how we should document that too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great question. Currently if you tried to do this with a text field, for example, it would let you create the filter but then fail on type when trying to perform the search itself. This would provide a hint to the user but is not perfect.

I think it might be worth an enhancement to add a function that checks this ahead of time for all filter types against schema. Because right now I don't think we provide a meaningful error for a text field on a numeric either.

@abrookins thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small issue here:

11:03:03 [RedisVL] INFO   Indices:
11:03:03 [RedisVL] INFO   1. float64_session
11:03:03 [RedisVL] INFO   2. float64_cache
11:03:03 [RedisVL] INFO   3. float16_cache
11:03:03 [RedisVL] INFO   4. float32_session
11:03:03 [RedisVL] INFO   5. float16_session
11:03:03 [RedisVL] INFO   6. bfloat_session
11:03:03 [RedisVL] INFO   7. float32_cache
11:03:03 [RedisVL] INFO   8. bfloat_cache
11:03:03 [RedisVL] INFO   9. user_queries

This is collecting indices from your local instance that weren't generated by the notebook (flush the db before running the notebook and recommit)

@tylerhutcherson
Copy link
Collaborator

tylerhutcherson commented Mar 19, 2025

Ahh one more thing: new class so we need to add to our docs.
https://github.com/redis/redis-vl-python/blob/main/docs/api/filter.rst

Also need to make sure to open these PRs against 0.0.5 branch not main :)

@tylerhutcherson tylerhutcherson changed the base branch from main to 0.5.0 March 19, 2025 16:41
@tylerhutcherson tylerhutcherson added the enhancement New feature or request label Mar 19, 2025
@rbs333 rbs333 merged commit 103c221 into 0.5.0 Mar 21, 2025
36 checks passed
@rbs333 rbs333 deleted the feat/RAAE-202/timestamp branch March 21, 2025 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants