-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[TST] Add local_simple_hash embedding, tests, docs, and example #5732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[TST] Add local_simple_hash embedding, tests, docs, and example #5732
Conversation
- Adds a deterministic, dependency-free 'local_simple_hash' embedding for smoke tests - Adds unit tests covering determinism, long strings, and non-string inputs - Adds example script and updates DEVELOP.md and examples/README.md - Adds docs section describing lightweight local embeddings
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
|
Add Introduces a tiny, dependency-free embedding implementation ( Key Changes• New file Affected Areas• This summary was automatically generated by @propel-code-bot |
chromadb/utils/embedding_functions/simple_hash_embedding_function.py
Outdated
Show resolved
Hide resolved
…ion.py updating the __call__ method which expects ValueError if input is None Co-authored-by: propel-code-bot[bot] <203372662+propel-code-bot[bot]@users.noreply.github.com>
|
Hi maintainers 👋 Could you please approve the workflow runs for this pull request? Once the workflows are approved, all required checks should run and report their results, allowing the PR to move forward in the review and merge process. If you have any feedback or requested changes after the checks complete, please let me know—I’m happy to address them promptly. Thank you very much for your time and for helping with the approval and review! |
Description of changes
Summary
This PR adds a small, deterministic, dependency-free local embedding implementation designed for quick smoke tests, examples, and contributor onboarding.
It introduces the
SimpleHashEmbeddingFunction, supporting deterministic embeddings without requiring any external models or API keys.This change is Python-only, focused on improving testability and developer experience when setting up Chroma locally.
Improvements & Additions
New Embedding Function
simple_hash_embedding_function.pySimpleHashEmbeddingFunction, following the repository’sEmbeddingFunctionconvention.list[str]inputs and returns fixed-dimensional NumPy embeddings.Exports & Registry
__init__.pyupdated to expose and register the new embedding under the name"local_simple_hash"for both direct import and config-driven creation.Tests
test_simple_hash_embedding.pyadded to validate:Example Script
examples/local_simple_hash_example.pyDocumentation
examples/README.md– references the new example script.DEVELOP.md– includes a short note for Windows Python-only environments relevant to local testing.docs/docs.trychroma.com/.../embedding-functions.md– adds a new entry describinglocal_simple_hashusage and config pattern.Test plan
Local Testing (Windows PowerShell or Unix Terminal)
Install editable package:
python -m pip install -e .Run pre-commit only on the modified files:
Execute new unit tests:
Run the example script:
Expected output:
Embedding metadata (vector length, dtype, L2 norm) for each input.
A second embedding displayed for the function created via config:
Validation performed:
pytestpassed for all added tests.pre-commitauto-fixes applied and passed on modified files.Migration plan
No migrations or backward compatibility concerns.
This feature is isolated, purely additive, and does not impact existing embeddings or APIs.
Existing users or pipelines remain unaffected.
Observability plan
No new runtime instrumentation required.
Developers can validate correct functionality locally via:
No production monitoring changes needed — this is a local-only utility function.
Documentation Changes
Added documentation:
embedding-functions.mddescribinglocal_simple_hashand its config-driven usage pattern.Updated:
DEVELOP.mdwith an additional note relevant to Windows Python-only contributor setups.examples/README.mdto include the runnable example reference.All doc changes validated for formatting and linted with
pre-commit.Notes for reviewers
flake8ormypywarnings, note that only the files listed above were intentionally changed.