diff --git a/README.md b/README.md index 63b4f05..1f3d47d 100644 --- a/README.md +++ b/README.md @@ -8,25 +8,16 @@

-

Reliable and Efficient Semantic Prompt Caching


- - - Semantic caching reduces LLM latency and cost by returning cached model responses for semantically similar prompts (not just exact matches). **vCache** is the first verified semantic cache that **guarantees user-defined error rate bounds**. vCache replaces static thresholds with **online-learned, embedding-specific decision boundaries**—no manual fine-tuning required. This enables reliable cached response reuse across any embedding model or workload. - - > [NOTE] > vCache is currently in active development. Features and APIs may change as we continue to improve the system. - - - ## 🚀 Quick Install Install vCache in editable mode: @@ -40,6 +31,7 @@ Then, set your OpenAI key: ```bash export OPENAI_API_KEY="your_api_key_here" ``` + (Note: vCache uses OpenAI by default for both LLM inference and embedding generation, but you can configure any other backend) Finally, use vCache in your Python code: @@ -53,16 +45,14 @@ print(f"Response: {response}") ``` By default, vCache uses: + - `OpenAIInferenceEngine` - `OpenAIEmbeddingEngine` - `HNSWLibVectorDB` -- `InMemoryEmbeddingMetadataStorage` - `NoEvictionPolicy` - `StringComparisonSimilarityEvaluator` - `VerifiedDecisionPolicy` with a maximum failure rate of 2% - - ## ⚙️ Advanced Configuration vCache is modular and highly configurable. Below is an example showing how to customize key components: @@ -75,12 +65,12 @@ from vcache.main import VCache from vcache.config import VCacheConfig from vcache.inference_engine.strategies.open_ai import OpenAIInferenceEngine from vcache.vcache_core.cache.embedding_engine.strategies.open_ai import OpenAIEmbeddingEngine -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.strategies.in_memory import InMemoryEmbeddingMetadataStorage from vcache.vcache_core.similarity_evaluator.strategies.string_comparison import StringComparisonSimilarityEvaluator from vcache.vcache_policy.strategies.dynamic_local_threshold import VerifiedDecisionPolicy from vcache.vcache_policy.vcache_policy import VCachePolicy -from vcache.vcache_core.cache.embedding_store.vector_db import HNSWLibVectorDB, SimilarityMetricType +from vcache.vcache_core.cache.vector_db import HNSWLibVectorDB, SimilarityMetricType ``` + ```python @@ -92,7 +82,6 @@ vcache_config: VCacheConfig = VCacheConfig( similarity_metric_type=SimilarityMetricType.COSINE, max_capacity=100_000, ), - embedding_metadata_storage=InMemoryEmbeddingMetadataStorage(), similarity_evaluator=StringComparisonSimilarityEvaluator, ) @@ -117,16 +106,16 @@ Semantic caching reduces LLM latency and cost by returning cached model response ### Architecture Overview 1. **Embed & Store** -Each prompt is converted to a fixed-length vector (an “embedding”) and stored in a vector database along with its LLM response. + Each prompt is converted to a fixed-length vector (an “embedding”) and stored in a vector database along with its LLM response. 2. **Nearest-Neighbor Lookup** -When a new prompt arrives, the cache embeds it and finds its most similar stored prompt using a similarity metric (e.g., cosine similarity). + When a new prompt arrives, the cache embeds it and finds its most similar stored prompt using a similarity metric (e.g., cosine similarity). 3. **Similarity Score** -The system computes a score between 0 and 1 that quantifies how “close” the new prompt is to the retrieved entry. + The system computes a score between 0 and 1 that quantifies how “close” the new prompt is to the retrieved entry. -4. **Decision: Exploit vs. Explore** - - **Exploit (cache hit):** If the similarity is above a confidence bound, return the cached response. +4. **Decision: Exploit vs. Explore** + - **Exploit (cache hit):** If the similarity is above a confidence bound, return the cached response. - **Explore (cache miss):** Otherwise, infer the LLM for a response, add its embedding and answer to the cache, and return it.

@@ -139,6 +128,7 @@ The system computes a score between 0 and 1 that quantifies how “close” the

### Why Fixed Thresholds Fall Short + Existing semantic caches rely on a **global static threshold** to decide whether to reuse a cached response (exploit) or invoke the LLM (explore). If the similarity score exceeds this threshold, the cache reuses the response; otherwise, it infers the model. This strategy is fundamentally limited. - **Uniform threshold, diverse prompts:** A fixed threshold assumes all embeddings are equally distributed—ignoring that similarity is context-dependent. @@ -161,11 +151,11 @@ vCache overcomes these limitations with two ideas: ### Benefits - **Reliability** - Formally bounds the rate of incorrect cache hits to your chosen tolerance. + Formally bounds the rate of incorrect cache hits to your chosen tolerance. - **Performance** - Matches or exceeds static-threshold systems in cache hit rate and end-to-end latency. + Matches or exceeds static-threshold systems in cache hit rate and end-to-end latency. - **Simplicity** - Plug in any embedding model; vCache learns and adapts automatically at runtime. + Plug in any embedding model; vCache learns and adapts automatically at runtime.

@@ -178,25 +168,24 @@ vCache overcomes these limitations with two ideas: Please refer to the [vCache paper](https://arxiv.org/abs/2502.03771) for further details. - ## 🛠 Developer Guide For advanced usage and development setup, see the [Developer Guide](ReadMe_Dev.md). - - ## 📊 Benchmarking vCache vCache includes a benchmarking framework to evaluate: + - **Cache hit rate** - **Error rate** - **Latency improvement** - **...** We provide three open benchmarks: -- **SemCacheLmArena** (chat-style prompts) - [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkLmArena) -- **SemCacheClassification** (classification queries) - [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkClassification) -- **SemCacheSearchQueries** (real-world search logs) - [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkSearchQueries) + +- **SemCacheLmArena** (chat-style prompts) - [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkLmArena) +- **SemCacheClassification** (classification queries) - [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkClassification) +- **SemCacheSearchQueries** (real-world search logs) - [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkSearchQueries) See the [Benchmarking Documentation](benchmarks/ReadMe.md) for instructions. @@ -211,4 +200,4 @@ If you use vCache for your research, please cite our [paper](https://arxiv.org/a journal={arXiv preprint arXiv:2502.03771}, year={2025} } -``` \ No newline at end of file +``` diff --git a/benchmarks/benchmark.py b/benchmarks/benchmark.py index 34efa30..0095c82 100644 --- a/benchmarks/benchmark.py +++ b/benchmarks/benchmark.py @@ -23,16 +23,13 @@ from vcache.vcache_core.cache.embedding_engine.strategies.benchmark import ( BenchmarkEmbeddingEngine, ) -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage import ( - InMemoryEmbeddingMetadataStorage, -) -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import ( - EmbeddingMetadataObj, -) -from vcache.vcache_core.cache.embedding_store.vector_db import ( +from vcache.vcache_core.cache.vector_db import ( HNSWLibVectorDB, SimilarityMetricType, ) +from vcache.vcache_core.cache.vector_db.embedding_metadata_obj import ( + EmbeddingMetadataObj, +) from vcache.vcache_core.similarity_evaluator import SimilarityEvaluator from vcache.vcache_core.similarity_evaluator.strategies.llm_comparison import ( LLMComparisonSimilarityEvaluator, @@ -396,7 +393,7 @@ def dump_results_to_json(self): var_ts_dict = {} metadata_objects: List[EmbeddingMetadataObj] = ( - self.vcache.vcache_config.embedding_metadata_storage.get_all_embedding_metadata_objects() + self.vcache.vcache_config.vector_db.get_all_embedding_metadata_objects() ) for metadata_object in metadata_objects: @@ -486,7 +483,6 @@ def __run_baseline( similarity_metric_type=SimilarityMetricType.COSINE, max_capacity=MAX_VECTOR_DB_CAPACITY, ), - embedding_metadata_storage=InMemoryEmbeddingMetadataStorage(), similarity_evaluator=similarity_evaluator, ) vcache: VCache = VCache(vcache_config, vcache_policy) diff --git a/test.py b/test.py index 62b2da7..bf042b6 100644 --- a/test.py +++ b/test.py @@ -4,10 +4,7 @@ from vcache.vcache_core.cache.embedding_engine.strategies.open_ai import ( OpenAIEmbeddingEngine, ) -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.strategies.in_memory import ( - InMemoryEmbeddingMetadataStorage, -) -from vcache.vcache_core.cache.embedding_store.vector_db import ( +from vcache.vcache_core.cache.vector_db import ( HNSWLibVectorDB, SimilarityMetricType, ) @@ -27,7 +24,6 @@ similarity_metric_type=SimilarityMetricType.COSINE, max_capacity=100000, ), - embedding_metadata_storage=InMemoryEmbeddingMetadataStorage(), similarity_evaluator=StringComparisonSimilarityEvaluator, ) vcache: VCache = VCache(vcache_config, vcache_policy) diff --git a/tests/integration/test_concurrency.py b/tests/integration/test_concurrency.py index 4174303..6a1a714 100644 --- a/tests/integration/test_concurrency.py +++ b/tests/integration/test_concurrency.py @@ -8,7 +8,6 @@ from vcache import ( HNSWLibVectorDB, - InMemoryEmbeddingMetadataStorage, LangChainEmbeddingEngine, StringComparisonSimilarityEvaluator, VCache, @@ -46,7 +45,6 @@ def answers_similar(a, b): model_name="sentence-transformers/all-mpnet-base-v2" ), vector_db=HNSWLibVectorDB(), - embedding_metadata_storage=InMemoryEmbeddingMetadataStorage(), similarity_evaluator=similarity_evaluator, ) @@ -93,7 +91,9 @@ def do_inference(prompt): time.sleep(1.5) executor.map(do_inference, concurrent_prompts_chunk_2) - all_metadata_objects = vcache.vcache_config.embedding_metadata_storage.get_all_embedding_metadata_objects() + all_metadata_objects = ( + vcache.vcache_config.vector_db.get_all_embedding_metadata_objects() + ) final_observation_count = len(all_metadata_objects) for i, metadata_object in enumerate(all_metadata_objects): diff --git a/tests/integration/test_dynamic_threshold.py b/tests/integration/test_dynamic_threshold.py index 4d4d919..b04d147 100644 --- a/tests/integration/test_dynamic_threshold.py +++ b/tests/integration/test_dynamic_threshold.py @@ -4,7 +4,6 @@ from vcache import ( HNSWLibVectorDB, - InMemoryEmbeddingMetadataStorage, LangChainEmbeddingEngine, OpenAIInferenceEngine, VCache, @@ -25,7 +24,6 @@ def create_default_config_and_policy(): model_name="sentence-transformers/all-mpnet-base-v2" ), vector_db=HNSWLibVectorDB(), - embedding_metadata_storage=InMemoryEmbeddingMetadataStorage(), system_prompt="Please answer in a single word with the first letter capitalized. Example: London", ) policy = VerifiedDecisionPolicy(delta=0.05) diff --git a/tests/integration/test_static_threshold.py b/tests/integration/test_static_threshold.py index d75300e..96bdd5e 100644 --- a/tests/integration/test_static_threshold.py +++ b/tests/integration/test_static_threshold.py @@ -5,7 +5,6 @@ from vcache import ( BenchmarkStaticDecisionPolicy, HNSWLibVectorDB, - InMemoryEmbeddingMetadataStorage, LangChainEmbeddingEngine, OpenAIInferenceEngine, VCache, @@ -25,7 +24,6 @@ def create_default_config_and_policy(): model_name="sentence-transformers/all-mpnet-base-v2" ), vector_db=HNSWLibVectorDB(), - embedding_metadata_storage=InMemoryEmbeddingMetadataStorage(), ) policy = BenchmarkStaticDecisionPolicy(threshold=0.8) return config, policy diff --git a/tests/unit/EmbeddingMetadataStrategy/test_embedding_metadata.py b/tests/unit/EmbeddingMetadataStrategy/test_embedding_metadata.py deleted file mode 100644 index e266159..0000000 --- a/tests/unit/EmbeddingMetadataStrategy/test_embedding_metadata.py +++ /dev/null @@ -1,32 +0,0 @@ -import unittest - -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage import ( - InMemoryEmbeddingMetadataStorage, -) -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import ( - EmbeddingMetadataObj, -) - - -class TestEmbeddingMetadataStorageStrategy(unittest.TestCase): - def test_in_memory_strategy(self): - embedding_metadata_storage = InMemoryEmbeddingMetadataStorage() - - initial_obj = EmbeddingMetadataObj(embedding_id=0, response="test") - embedding_id = embedding_metadata_storage.add_metadata( - embedding_id=0, metadata=initial_obj - ) - assert embedding_id == 0 - assert embedding_metadata_storage.get_metadata(embedding_id=0) == initial_obj - - updated_obj = EmbeddingMetadataObj(embedding_id=0, response="test2") - embedding_metadata_storage.update_metadata(embedding_id=0, metadata=updated_obj) - assert embedding_metadata_storage.get_metadata(embedding_id=0) == updated_obj - - embedding_metadata_storage.flush() - with self.assertRaises(ValueError): - embedding_metadata_storage.get_metadata(embedding_id=0) - - -if __name__ == "__main__": - unittest.main() diff --git a/tests/unit/VCachePolicyStrategy/test_vcache_policy.py b/tests/unit/VCachePolicyStrategy/test_vcache_policy.py index 50bdcbc..01d24c1 100644 --- a/tests/unit/VCachePolicyStrategy/test_vcache_policy.py +++ b/tests/unit/VCachePolicyStrategy/test_vcache_policy.py @@ -3,7 +3,7 @@ from unittest.mock import MagicMock, patch from vcache.config import VCacheConfig -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import ( +from vcache.vcache_core.cache.vector_db import ( EmbeddingMetadataObj, ) from vcache.vcache_policy.strategies.verified import ( @@ -48,11 +48,9 @@ def update_metadata(embedding_id, embedding_metadata): mock_config = MagicMock(spec=VCacheConfig) mock_config.inference_engine = self.mock_inference_engine mock_config.similarity_evaluator = self.mock_similarity_evaluator - # Add all required attributes for Cache creation - mock_config.embedding_engine = MagicMock() - mock_config.embedding_metadata_storage = MagicMock() mock_config.vector_db = MagicMock() mock_config.eviction_policy = MagicMock() + mock_config.embedding_engine = MagicMock() self.policy = VerifiedDecisionPolicy() self.policy.setup(mock_config) diff --git a/tests/unit/VectorDBStrategy/test_vector_db.py b/tests/unit/VectorDBStrategy/test_vector_db.py index 2ca1de9..d5b3a2c 100644 --- a/tests/unit/VectorDBStrategy/test_vector_db.py +++ b/tests/unit/VectorDBStrategy/test_vector_db.py @@ -2,9 +2,8 @@ import pytest -from vcache.vcache_core.cache.embedding_store.vector_db import ( - ChromaVectorDB, - FAISSVectorDB, +from vcache.vcache_core.cache.vector_db import ( + EmbeddingMetadataObj, HNSWLibVectorDB, SimilarityMetricType, ) @@ -12,10 +11,6 @@ VECTOR_DB_PARAMS = [ (HNSWLibVectorDB, SimilarityMetricType.COSINE), (HNSWLibVectorDB, SimilarityMetricType.EUCLIDEAN), - (FAISSVectorDB, SimilarityMetricType.COSINE), - (FAISSVectorDB, SimilarityMetricType.EUCLIDEAN), - (ChromaVectorDB, SimilarityMetricType.COSINE), - (ChromaVectorDB, SimilarityMetricType.EUCLIDEAN), ] @@ -32,15 +27,19 @@ def test_add_and_get_knn(self, vector_db_class, similarity_metric_type): # Test with a single embedding embedding = [0.1, 0.2, 0.3] - id1 = vector_db.add(embedding=embedding) + metadata1 = EmbeddingMetadataObj(response="test response 1") + id1 = vector_db.add(embedding=embedding, metadata=metadata1) + knn = vector_db.get_knn(embedding=embedding, k=1) assert len(knn) == 1 assert abs(knn[0][0] - 1.0) < 1e-5 # Should be a perfect match assert knn[0][1] == id1 # Test with multiple embeddings - vector_db.add(embedding=[0.2, 0.3, 0.4]) - vector_db.add(embedding=[0.3, 0.4, 0.5]) + metadata2 = EmbeddingMetadataObj(response="test response 2") + metadata3 = EmbeddingMetadataObj(response="test response 3") + vector_db.add(embedding=[0.2, 0.3, 0.4], metadata=metadata2) + vector_db.add(embedding=[0.3, 0.4, 0.5], metadata=metadata3) # Verify we get all embeddings when k is large enough knn = vector_db.get_knn(embedding=embedding, k=3) @@ -59,8 +58,10 @@ def test_remove(self, vector_db_class, similarity_metric_type): vector_db = vector_db_class(similarity_metric_type=similarity_metric_type) # Add multiple embeddings - id1 = vector_db.add(embedding=[0.1, 0.2, 0.3]) - id2 = vector_db.add(embedding=[0.2, 0.3, 0.4]) + metadata1 = EmbeddingMetadataObj(response="test response 1") + metadata2 = EmbeddingMetadataObj(response="test response 2") + id1 = vector_db.add(embedding=[0.1, 0.2, 0.3], metadata=metadata1) + id2 = vector_db.add(embedding=[0.2, 0.3, 0.4], metadata=metadata2) # Verify both exist knn = vector_db.get_knn(embedding=[0.1, 0.2, 0.3], k=2) @@ -83,9 +84,12 @@ def test_reset(self, vector_db_class, similarity_metric_type): vector_db = vector_db_class(similarity_metric_type=similarity_metric_type) # Add multiple embeddings - vector_db.add(embedding=[0.1, 0.2, 0.3]) - vector_db.add(embedding=[0.2, 0.3, 0.4]) - vector_db.add(embedding=[0.3, 0.4, 0.5]) + metadata1 = EmbeddingMetadataObj(response="test response 1") + metadata2 = EmbeddingMetadataObj(response="test response 2") + metadata3 = EmbeddingMetadataObj(response="test response 3") + vector_db.add(embedding=[0.1, 0.2, 0.3], metadata=metadata1) + vector_db.add(embedding=[0.2, 0.3, 0.4], metadata=metadata2) + vector_db.add(embedding=[0.3, 0.4, 0.5], metadata=metadata3) # Verify embeddings exist knn = vector_db.get_knn(embedding=[0.1, 0.2, 0.3], k=3) @@ -98,6 +102,37 @@ def test_reset(self, vector_db_class, similarity_metric_type): knn = vector_db.get_knn(embedding=[0.1, 0.2, 0.3], k=3) assert len(knn) == 0 + @pytest.mark.parametrize( + "vector_db_class, similarity_metric_type", + VECTOR_DB_PARAMS, + ) + def test_metadata_operations(self, vector_db_class, similarity_metric_type): + """Test metadata operations of the vector database.""" + vector_db = vector_db_class(similarity_metric_type=similarity_metric_type) + + # Add embedding with metadata (embedding_id will be set automatically) + metadata = EmbeddingMetadataObj(response="test response") + embedding_id = vector_db.add(embedding=[0.1, 0.2, 0.3], metadata=metadata) + + # Test get metadata + retrieved_metadata = vector_db.get_metadata(embedding_id) + assert retrieved_metadata.response == "test response" + assert ( + retrieved_metadata.embedding_id == embedding_id + ) # Should be set automatically + + # Test update metadata + updated_metadata = EmbeddingMetadataObj(response="updated response") + vector_db.update_metadata(embedding_id, updated_metadata) + + retrieved_metadata = vector_db.get_metadata(embedding_id) + assert retrieved_metadata.response == "updated response" + + # Test get all metadata objects + all_metadata = vector_db.get_all_embedding_metadata_objects() + assert len(all_metadata) == 1 + assert all_metadata[0].response == "updated response" + if __name__ == "__main__": unittest.main() diff --git a/vcache/__init__.py b/vcache/__init__.py index e5140a1..8e989b4 100644 --- a/vcache/__init__.py +++ b/vcache/__init__.py @@ -20,27 +20,19 @@ OpenAIEmbeddingEngine, ) -# Embedding metadata storage -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage import ( - EmbeddingMetadataStorage, - InMemoryEmbeddingMetadataStorage, +# Eviction policies +from vcache.vcache_core.cache.eviction_policy import ( + EvictionPolicy, + LRUEvictionPolicy, ) # Vector databases -from vcache.vcache_core.cache.embedding_store.vector_db import ( - ChromaVectorDB, - FAISSVectorDB, +from vcache.vcache_core.cache.vector_db import ( HNSWLibVectorDB, SimilarityMetricType, VectorDB, ) -# Eviction policies -from vcache.vcache_core.cache.eviction_policy import ( - EvictionPolicy, - LRUEvictionPolicy, -) - # Similarity evaluators from vcache.vcache_core.similarity_evaluator import ( SimilarityEvaluator, @@ -71,9 +63,7 @@ "LangChainEmbeddingEngine", # Vector databases "VectorDB", - "FAISSVectorDB", "HNSWLibVectorDB", - "ChromaVectorDB", "SimilarityMetricType", # Similarity evaluators "SimilarityEvaluator", @@ -81,9 +71,6 @@ # Eviction policies "EvictionPolicy", "LRUEvictionPolicy", - # Embedding metadata storage - "EmbeddingMetadataStorage", - "InMemoryEmbeddingMetadataStorage", # vCache Policies "VCachePolicy", "VerifiedDecisionPolicy", diff --git a/vcache/config.py b/vcache/config.py index f337648..d284920 100644 --- a/vcache/config.py +++ b/vcache/config.py @@ -4,20 +4,14 @@ from vcache.inference_engine.strategies.open_ai import OpenAIInferenceEngine from vcache.vcache_core.cache.embedding_engine import OpenAIEmbeddingEngine from vcache.vcache_core.cache.embedding_engine.embedding_engine import EmbeddingEngine -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_storage import ( - EmbeddingMetadataStorage, -) -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.strategies.in_memory import ( - InMemoryEmbeddingMetadataStorage, -) -from vcache.vcache_core.cache.embedding_store.vector_db import VectorDB -from vcache.vcache_core.cache.embedding_store.vector_db.strategies.hnsw_lib import ( - HNSWLibVectorDB, -) from vcache.vcache_core.cache.eviction_policy.eviction_policy import EvictionPolicy from vcache.vcache_core.cache.eviction_policy.strategies.no_eviction import ( NoEvictionPolicy, ) +from vcache.vcache_core.cache.vector_db.strategies.hnsw_lib import ( + HNSWLibVectorDB, +) +from vcache.vcache_core.cache.vector_db.vector_db import VectorDB from vcache.vcache_core.similarity_evaluator.similarity_evaluator import ( SimilarityEvaluator, ) @@ -36,7 +30,6 @@ def __init__( inference_engine: InferenceEngine = OpenAIInferenceEngine(), embedding_engine: EmbeddingEngine = OpenAIEmbeddingEngine(), vector_db: VectorDB = HNSWLibVectorDB(), - embedding_metadata_storage: EmbeddingMetadataStorage = InMemoryEmbeddingMetadataStorage(), eviction_policy: EvictionPolicy = NoEvictionPolicy(), similarity_evaluator: SimilarityEvaluator = StringComparisonSimilarityEvaluator(), system_prompt: Optional[str] = None, @@ -47,8 +40,7 @@ def __init__( Args: inference_engine: Engine for generating responses from prompts. embedding_engine: Engine for generating embeddings from text. - vector_db: Vector database for storing and retrieving embeddings. - embedding_metadata_storage: Storage for embedding metadata. + vector_db: Vector database for storing embeddings and metadata. eviction_policy: Policy for removing items from cache when full. similarity_evaluator: Evaluator for determining similarity between prompts. system_prompt: Optional system prompt to use for all inferences. @@ -57,7 +49,6 @@ def __init__( self.embedding_engine = embedding_engine self.vector_db = vector_db self.eviction_policy = eviction_policy - self.embedding_metadata_storage = embedding_metadata_storage self.similarity_evaluator = similarity_evaluator self.similarity_evaluator.set_inference_engine(self.inference_engine) self.system_prompt = system_prompt diff --git a/vcache/vcache_core/cache/__init__.py b/vcache/vcache_core/cache/__init__.py index 861d231..7882f73 100644 --- a/vcache/vcache_core/cache/__init__.py +++ b/vcache/vcache_core/cache/__init__.py @@ -1,21 +1,15 @@ from vcache.vcache_core.cache.cache import Cache from vcache.vcache_core.cache.embedding_engine.embedding_engine import EmbeddingEngine -from vcache.vcache_core.cache.embedding_store import EmbeddingStore -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import ( - EmbeddingMetadataObj, -) -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_storage import ( - EmbeddingMetadataStorage, -) from vcache.vcache_core.cache.eviction_policy.eviction_policy import EvictionPolicy from vcache.vcache_core.cache.eviction_policy.system_monitor import SystemMonitor +from vcache.vcache_core.cache.vector_db.embedding_metadata_obj import ( + EmbeddingMetadataObj, +) __all__ = [ "Cache", - "EmbeddingStore", "EmbeddingMetadataObj", "EmbeddingEngine", - "EmbeddingMetadataStorage", "EvictionPolicy", "SystemMonitor", ] diff --git a/vcache/vcache_core/cache/cache.py b/vcache/vcache_core/cache/cache.py index 60adf17..7e29a4e 100644 --- a/vcache/vcache_core/cache/cache.py +++ b/vcache/vcache_core/cache/cache.py @@ -1,11 +1,11 @@ from typing import List from vcache.vcache_core.cache.embedding_engine.embedding_engine import EmbeddingEngine -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import ( +from vcache.vcache_core.cache.eviction_policy.eviction_policy import EvictionPolicy +from vcache.vcache_core.cache.vector_db.embedding_metadata_obj import ( EmbeddingMetadataObj, ) -from vcache.vcache_core.cache.embedding_store.embedding_store import EmbeddingStore -from vcache.vcache_core.cache.eviction_policy.eviction_policy import EvictionPolicy +from vcache.vcache_core.cache.vector_db.vector_db import VectorDB class Cache: @@ -15,29 +15,25 @@ class Cache: def __init__( self, - embedding_store: EmbeddingStore, + vector_db: VectorDB, embedding_engine: EmbeddingEngine, eviction_policy: EvictionPolicy, ): """ - Initialize cache with embedding store, engine, and eviction policy. + Initialize cache with vector database, engine, and eviction policy. Args: - embedding_store: Store for managing embeddings and metadata. + vector_db: Vector database for managing embeddings and metadata. embedding_engine: Engine for generating embeddings from text. eviction_policy: Policy for removing items when cache is full. """ - self.embedding_store = embedding_store + self.vector_db = vector_db self.embedding_engine = embedding_engine self.eviction_policy = eviction_policy def add(self, prompt: str, response: str) -> int: """ - Compute the embedding for the prompt, add an embedding to the vector database and a new metadata object. - - IMPORTANT: The embedding is computed first and then added to the vector database. - The metadata object is added last. - Consider this when implementing asynchronous logic to prevent race conditions. + Compute the embedding for the prompt, add an embedding to the vector database with metadata. Args: prompt: The prompt to add to the cache. @@ -47,7 +43,9 @@ def add(self, prompt: str, response: str) -> int: The id of the embedding. """ embedding = self.embedding_engine.get_embedding(prompt) - return self.embedding_store.add_embedding(embedding, response) + metadata = EmbeddingMetadataObj(response=response) + embedding_id = self.vector_db.add(embedding, metadata) + return embedding_id def remove(self, embedding_id: int) -> int: """ @@ -59,7 +57,7 @@ def remove(self, embedding_id: int) -> int: Returns: The id of the embedding. """ - return self.embedding_store.remove(embedding_id) + return self.vector_db.remove(embedding_id) def get_knn(self, prompt: str, k: int) -> List[tuple[float, int]]: """ @@ -73,13 +71,13 @@ def get_knn(self, prompt: str, k: int) -> List[tuple[float, int]]: A list of tuples, each containing a similarity score and an embedding id. """ embedding = self.embedding_engine.get_embedding(prompt) - return self.embedding_store.get_knn(embedding, k) + return self.vector_db.get_knn(embedding, k) def flush(self) -> None: """ Flush all data from the cache. """ - self.embedding_store.reset() + self.vector_db.reset() def get_metadata(self, embedding_id: int) -> EmbeddingMetadataObj: """ @@ -91,7 +89,7 @@ def get_metadata(self, embedding_id: int) -> EmbeddingMetadataObj: Returns: The metadata of the embedding. """ - return self.embedding_store.get_metadata(embedding_id) + return self.vector_db.get_metadata(embedding_id) def update_metadata( self, embedding_id: int, embedding_metadata: EmbeddingMetadataObj @@ -106,7 +104,7 @@ def update_metadata( Returns: The updated metadata of the embedding. """ - return self.embedding_store.update_metadata(embedding_id, embedding_metadata) + return self.vector_db.update_metadata(embedding_id, embedding_metadata) def get_current_capacity(self) -> int: """ @@ -125,7 +123,7 @@ def is_empty(self) -> bool: Returns: True if the cache is empty, False otherwise. """ - return self.embedding_store.is_empty() + return self.vector_db.is_empty() def get_all_embedding_metadata_objects(self) -> List[EmbeddingMetadataObj]: """ @@ -134,4 +132,4 @@ def get_all_embedding_metadata_objects(self) -> List[EmbeddingMetadataObj]: Returns: A list of all the embedding metadata objects in the cache. """ - return self.embedding_store.embedding_metadata_storage.get_all_embedding_metadata_objects() + return self.vector_db.get_all_embedding_metadata_objects() diff --git a/vcache/vcache_core/cache/embedding_store/__init__.py b/vcache/vcache_core/cache/embedding_store/__init__.py deleted file mode 100644 index febd930..0000000 --- a/vcache/vcache_core/cache/embedding_store/__init__.py +++ /dev/null @@ -1,11 +0,0 @@ -from vcache.vcache_core.cache.embedding_store.embedding_store import EmbeddingStore -from vcache.vcache_core.cache.embedding_store.vector_db import ( - SimilarityMetricType, - VectorDB, -) - -__all__ = [ - "VectorDB", - "EmbeddingStore", - "SimilarityMetricType", -] diff --git a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/__init__.py b/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/__init__.py deleted file mode 100644 index 77692de..0000000 --- a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/__init__.py +++ /dev/null @@ -1,8 +0,0 @@ -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_storage import ( - EmbeddingMetadataStorage, -) -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.strategies.in_memory import ( - InMemoryEmbeddingMetadataStorage, -) - -__all__ = ["EmbeddingMetadataStorage", "InMemoryEmbeddingMetadataStorage"] diff --git a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/embedding_metadata_storage.py b/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/embedding_metadata_storage.py deleted file mode 100644 index 0c7fcbd..0000000 --- a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/embedding_metadata_storage.py +++ /dev/null @@ -1,82 +0,0 @@ -from abc import ABC, abstractmethod -from typing import List - -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import ( - EmbeddingMetadataObj, -) - - -class EmbeddingMetadataStorage(ABC): - """ - Abstract base class for embedding metadata storage. - """ - - @abstractmethod - def add_metadata(self, embedding_id: int, metadata: EmbeddingMetadataObj) -> int: - """ - Add metadata for a specific embedding. - - Args: - embedding_id: The id of the embedding to add the metadata for. - metadata: The metadata to add to the embedding. - - Returns: - The id of the embedding. - """ - pass - - @abstractmethod - def get_metadata(self, embedding_id: int) -> EmbeddingMetadataObj: - """ - Get metadata for a specific embedding. - - Args: - embedding_id: The id of the embedding to get the metadata for. - - Returns: - The metadata of the embedding. - """ - pass - - @abstractmethod - def update_metadata( - self, embedding_id: int, metadata: EmbeddingMetadataObj - ) -> EmbeddingMetadataObj: - """ - Update metadata for a specific embedding. - - Args: - embedding_id: The id of the embedding to update the metadata for. - metadata: The metadata to update the embedding with. - - Returns: - The updated metadata of the embedding. - """ - pass - - @abstractmethod - def remove_metadata(self, embedding_id: int) -> None: - """ - Remove metadata for a specific embedding. - - Args: - embedding_id: The id of the embedding to remove the metadata for. - """ - pass - - @abstractmethod - def flush(self) -> None: - """ - Flush all metadata from storage. - """ - pass - - @abstractmethod - def get_all_embedding_metadata_objects(self) -> List[EmbeddingMetadataObj]: - """ - Get all embedding metadata objects in storage. - - Returns: - A list of all the embedding metadata objects in the storage. - """ - pass diff --git a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/strategies/__init__.py b/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/strategies/__init__.py deleted file mode 100644 index e69de29..0000000 diff --git a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/strategies/in_memory.py b/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/strategies/in_memory.py deleted file mode 100644 index 3e49c8e..0000000 --- a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/strategies/in_memory.py +++ /dev/null @@ -1,110 +0,0 @@ -from typing import Any, Dict, List, Optional - -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import ( - EmbeddingMetadataObj, -) -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_storage import ( - EmbeddingMetadataStorage, -) - - -class InMemoryEmbeddingMetadataStorage(EmbeddingMetadataStorage): - """ - In-memory implementation of embedding metadata storage. - """ - - def __init__(self): - """ - Initialize in-memory embedding metadata storage. - """ - self.metadata_storage: Dict[int, "EmbeddingMetadataObj"] = {} - - def add_metadata( - self, embedding_id: int, metadata: Optional[Dict[str, Any]] = None - ) -> None: - """ - Add metadata for a specific embedding. - - Args: - embedding_id: The ID of the embedding to add metadata for. - metadata: The metadata to add. - - Returns: - The embedding ID. - """ - self.metadata_storage[embedding_id] = metadata - return embedding_id - - def get_metadata(self, embedding_id: int) -> Optional[Dict[str, Any]]: - """ - Get metadata for a specific embedding. - - Args: - embedding_id: The ID of the embedding to get metadata for. - - Returns: - The metadata for the embedding. - - Raises: - ValueError: If embedding metadata is not found. - """ - if embedding_id not in self.metadata_storage: - raise ValueError( - f"Embedding metadata for embedding id {embedding_id} not found" - ) - else: - return self.metadata_storage[embedding_id] - - def update_metadata( - self, embedding_id: int, metadata: Optional[Dict[str, Any]] = None - ) -> bool: - """ - Update metadata for a specific embedding. - - Args: - embedding_id: The ID of the embedding to update metadata for. - metadata: The new metadata to set. - - Returns: - The updated metadata. - - Raises: - ValueError: If embedding metadata is not found. - """ - if embedding_id not in self.metadata_storage: - raise ValueError( - f"Embedding metadata for embedding id {embedding_id} not found" - ) - else: - self.metadata_storage[embedding_id] = metadata - return metadata - - def remove_metadata(self, embedding_id: int) -> bool: - """ - Remove metadata for a specific embedding. - - Args: - embedding_id: The ID of the embedding to remove metadata for. - - Returns: - True if metadata was removed, False if not found. - """ - if embedding_id in self.metadata_storage: - del self.metadata_storage[embedding_id] - return True - return False - - def flush(self) -> None: - """ - Flush all metadata from storage. - """ - self.metadata_storage = {} - - def get_all_embedding_metadata_objects(self) -> List[EmbeddingMetadataObj]: - """ - Get all embedding metadata objects in storage. - - Returns: - A list of all embedding metadata objects. - """ - return list(self.metadata_storage.values()) diff --git a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/strategies/langchain.py b/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/strategies/langchain.py deleted file mode 100644 index 10025fb..0000000 --- a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/strategies/langchain.py +++ /dev/null @@ -1,93 +0,0 @@ -from typing import Any, Dict, List, Optional - -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import ( - EmbeddingMetadataObj, -) -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_storage import ( - EmbeddingMetadataStorage, -) - - -class LangchainMetadataStorage(EmbeddingMetadataStorage): - """ - LangChain-based metadata storage implementation (placeholder). - """ - - def __init__(self): - """ - Initialize LangChain metadata storage. - """ - # TODO - pass - - def add_metadata( - self, embedding_id: int, metadata: Optional[Dict[str, Any]] = None - ) -> None: - """ - Add metadata for an embedding. - - Args: - embedding_id: The ID of the embedding. - metadata: The metadata to add. - """ - # TODO - pass - - def get_metadata(self, embedding_id: int) -> Optional[Dict[str, Any]]: - """ - Get metadata for an embedding. - - Args: - embedding_id: The ID of the embedding. - - Returns: - The metadata for the embedding, or None if not found. - """ - # TODO - pass - - def update_metadata( - self, embedding_id: int, metadata: Optional[Dict[str, Any]] = None - ) -> bool: - """ - Update metadata for an embedding. - - Args: - embedding_id: The ID of the embedding. - metadata: The new metadata. - - Returns: - True if the update was successful, False otherwise. - """ - # TODO - pass - - def remove_metadata(self, embedding_id: int) -> bool: - """ - Remove metadata for an embedding. - - Args: - embedding_id: The ID of the embedding. - - Returns: - True if the removal was successful, False otherwise. - """ - # TODO - pass - - def flush(self) -> None: - """ - Flush any pending changes to storage. - """ - # TODO - pass - - def get_all_embedding_metadata_objects(self) -> List[EmbeddingMetadataObj]: - """ - Get all embedding metadata objects. - - Returns: - List of all embedding metadata objects. - """ - # TODO - pass diff --git a/vcache/vcache_core/cache/embedding_store/embedding_store.py b/vcache/vcache_core/cache/embedding_store/embedding_store.py deleted file mode 100644 index c36d10e..0000000 --- a/vcache/vcache_core/cache/embedding_store/embedding_store.py +++ /dev/null @@ -1,139 +0,0 @@ -import threading -from typing import List - -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage import ( - EmbeddingMetadataStorage, -) -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import ( - EmbeddingMetadataObj, -) -from vcache.vcache_core.cache.embedding_store.vector_db.vector_db import VectorDB - - -class EmbeddingStore: - """ - Store for managing embeddings and their associated metadata. - """ - - def __init__( - self, - vector_db: VectorDB, - embedding_metadata_storage: EmbeddingMetadataStorage, - ): - """ - Initialize embedding store with vector database and metadata storage. - - Args: - vector_db: Vector database for storing embeddings. - embedding_metadata_storage: Storage for embedding metadata. - """ - self.vector_db = vector_db - self.embedding_metadata_storage = embedding_metadata_storage - self._add_lock = threading.Lock() - self._remove_lock = threading.Lock() - - def add_embedding(self, embedding: List[float], response: str) -> int: - """ - Add an embedding to the vector database and a new metadata object. - - This operation is thread-safe. - - Args: - embedding: The embedding vector to add. - response: The response associated with the embedding. - - Returns: - The ID of the added embedding. - """ - with self._add_lock: - embedding_id = self.vector_db.add(embedding) - metadata = EmbeddingMetadataObj( - embedding_id=embedding_id, - response=response, - ) - self.embedding_metadata_storage.add_metadata( - embedding_id=embedding_id, metadata=metadata - ) - return embedding_id - - def remove(self, embedding_id: int) -> int: - """ - Remove an embedding and its metadata from the store. - - This operation is thread-safe. - - Args: - embedding_id: The ID of the embedding to remove. - - Returns: - The ID of the removed embedding. - """ - with self._remove_lock: - self.embedding_metadata_storage.remove_metadata(embedding_id) - return self.vector_db.remove(embedding_id) - - def get_knn(self, embedding: List[float], k: int) -> List[tuple[float, int]]: - """ - Get k-nearest neighbors for the given embedding. - - Args: - embedding: The embedding to find neighbors for. - k: The number of neighbors to return. - - Returns: - List of tuples containing similarity scores and embedding IDs. - """ - return self.vector_db.get_knn(embedding, k) - - def reset(self) -> None: - """ - Reset the embedding store to empty state. - """ - self.embedding_metadata_storage.flush() - return self.vector_db.reset() - - def calculate_storage_consumption(self) -> int: - """ - Calculate the storage consumption of the embedding store. - - Returns: - The storage consumption in bytes. - """ - # TODO: Add metadata logic - return -1 - - def get_metadata(self, embedding_id: int) -> "EmbeddingMetadataObj": - """ - Get metadata for a specific embedding. - - Args: - embedding_id: The ID of the embedding. - - Returns: - The metadata object for the embedding. - """ - return self.embedding_metadata_storage.get_metadata(embedding_id) - - def update_metadata( - self, embedding_id: int, metadata: "EmbeddingMetadataObj" - ) -> "EmbeddingMetadataObj": - """ - Update metadata for a specific embedding. - - Args: - embedding_id: The ID of the embedding. - metadata: The new metadata object. - - Returns: - The updated metadata object. - """ - return self.embedding_metadata_storage.update_metadata(embedding_id, metadata) - - def is_empty(self) -> bool: - """ - Check if the embedding store is empty. - - Returns: - True if the store is empty, False otherwise. - """ - return self.vector_db.is_empty() diff --git a/vcache/vcache_core/cache/embedding_store/vector_db/__init__.py b/vcache/vcache_core/cache/embedding_store/vector_db/__init__.py deleted file mode 100644 index bb4a6c3..0000000 --- a/vcache/vcache_core/cache/embedding_store/vector_db/__init__.py +++ /dev/null @@ -1,21 +0,0 @@ -from vcache.vcache_core.cache.embedding_store.vector_db.strategies.chroma import ( - ChromaVectorDB, -) -from vcache.vcache_core.cache.embedding_store.vector_db.strategies.faiss import ( - FAISSVectorDB, -) -from vcache.vcache_core.cache.embedding_store.vector_db.strategies.hnsw_lib import ( - HNSWLibVectorDB, -) -from vcache.vcache_core.cache.embedding_store.vector_db.vector_db import ( - SimilarityMetricType, - VectorDB, -) - -__all__ = [ - "VectorDB", - "SimilarityMetricType", - "HNSWLibVectorDB", - "FAISSVectorDB", - "ChromaVectorDB", -] diff --git a/vcache/vcache_core/cache/embedding_store/vector_db/strategies/chroma.py b/vcache/vcache_core/cache/embedding_store/vector_db/strategies/chroma.py deleted file mode 100644 index f2cfe47..0000000 --- a/vcache/vcache_core/cache/embedding_store/vector_db/strategies/chroma.py +++ /dev/null @@ -1,140 +0,0 @@ -from typing import List - -import chromadb - -from vcache.vcache_core.cache.embedding_store.vector_db.vector_db import ( - SimilarityMetricType, - VectorDB, -) - - -class ChromaVectorDB(VectorDB): - """ - ChromaDB-based vector database implementation for efficient similarity search. - """ - - def __init__( - self, similarity_metric_type: SimilarityMetricType = SimilarityMetricType.COSINE - ): - """ - Initialize ChromaDB vector database. - - Args: - similarity_metric_type: The similarity metric to use for comparisons. - """ - self.__next_embedding_id = 0 - self.collection = None - self.client = None - self.similarity_metric_type = similarity_metric_type - - def add(self, embedding: List[float]) -> int: - """ - Add an embedding vector to the database. - - Args: - embedding: The embedding vector to add. - - Returns: - The unique ID assigned to the added embedding. - """ - if self.collection is None: - self._init_vector_store(len(embedding)) - id = self.__next_embedding_id - self.collection.add(embeddings=[embedding], ids=[str(id)]) - self.__next_embedding_id += 1 - return id - - def remove(self, embedding_id: int) -> int: - """ - Remove an embedding from the database. - - Args: - embedding_id: The ID of the embedding to remove. - - Returns: - The ID of the removed embedding. - - Raises: - ValueError: If the collection is not initialized. - """ - if self.collection is None: - raise ValueError("Collection is not initialized") - self.collection.delete(ids=[str(embedding_id)]) - return embedding_id - - def get_knn(self, embedding: List[float], k: int) -> List[tuple[float, int]]: - """ - Get k-nearest neighbors for the given embedding. - - Args: - embedding: The query embedding vector. - k: The number of nearest neighbors to return. - - Returns: - List of tuples containing similarity scores and embedding IDs. - - Raises: - ValueError: If the collection is not initialized. - """ - if self.collection is None: - raise ValueError("Collection is not initialized") - if self.collection.count() == 0: - return [] - k_ = min(k, self.collection.count()) - results = self.collection.query( - query_embeddings=[embedding], n_results=k_, include=["distances"] - ) - distances = results.get("distances", [[]])[0] - ids = results.get("ids", [[]])[0] - return [ - ( - self.transform_similarity_score( - float(dist), self.similarity_metric_type.value - ), - int(idx), - ) - for dist, idx in zip(distances, ids) - ] - - def reset(self) -> None: - """ - Reset the vector database to empty state. - """ - if self.collection is not None: - self.collection.delete(ids=self.collection.get()["ids"]) - self.__next_embedding_id = 0 - - def _init_vector_store(self, embedding_dim: int): - """ - Initialize the ChromaDB collection with the given embedding dimension. - - Args: - embedding_dim: The dimension of the embedding vectors. - - Raises: - ValueError: If the similarity metric type is invalid. - """ - self.client = chromadb.Client() - collection_name = f"vcache_collection_{id(self)}" - metric_type = self.similarity_metric_type.value - match metric_type: - case "cosine": - space = "cosine" - case "euclidean": - space = "l2" - case _: - raise ValueError(f"Invalid similarity metric type: {metric_type}") - self.collection = self.client.create_collection( - name=collection_name, - metadata={"dimension": embedding_dim, "hnsw:space": space}, - get_or_create=True, - ) - - def is_empty(self) -> bool: - """ - Check if the vector database is empty. - - Returns: - True if the database contains no embeddings, False otherwise. - """ - return self.collection.count() == 0 diff --git a/vcache/vcache_core/cache/embedding_store/vector_db/strategies/faiss.py b/vcache/vcache_core/cache/embedding_store/vector_db/strategies/faiss.py deleted file mode 100644 index ad6f883..0000000 --- a/vcache/vcache_core/cache/embedding_store/vector_db/strategies/faiss.py +++ /dev/null @@ -1,170 +0,0 @@ -from typing import List - -import faiss -import numpy as np - -from vcache.vcache_core.cache.embedding_store.vector_db.vector_db import ( - SimilarityMetricType, - VectorDB, -) - - -class FAISSVectorDB(VectorDB): - """ - FAISS-based vector database implementation for efficient similarity search. - """ - - def __init__( - self, similarity_metric_type: SimilarityMetricType = SimilarityMetricType.COSINE - ): - """ - Initialize FAISS vector database. - - Args: - similarity_metric_type: The similarity metric to use for comparisons. - """ - self.similarity_metric_type = similarity_metric_type - self.__next_embedding_id = 0 - self.index = None - - def transform_similarity_score( - self, similarity_score: float, metric_type: str - ) -> float: - """ - Transform similarity score based on the metric type. - - Args: - similarity_score: The raw similarity score. - metric_type: The type of similarity metric used. - - Returns: - The transformed similarity score. - - Raises: - ValueError: If the similarity metric type is invalid. - """ - match metric_type: - case "cosine": - return similarity_score - case "euclidean": - return 1 - similarity_score - case _: - raise ValueError(f"Invalid similarity metric type: {metric_type}") - - def add(self, embedding: List[float]) -> int: - """ - Add an embedding vector to the database. - - Args: - embedding: The embedding vector to add. - - Returns: - The unique ID assigned to the added embedding. - """ - if self.index is None: - self._init_vector_store(len(embedding)) - id = self.__next_embedding_id - ids = np.array([id], dtype=np.int64) - embedding_array = np.array([embedding], dtype=np.float32) - metric_type = self.similarity_metric_type.value - # Normalize the embedding vector if the metric type is cosine - if metric_type == "cosine": - faiss.normalize_L2(embedding_array) - self.index.add_with_ids(embedding_array, ids) - self.__next_embedding_id += 1 - return id - - def remove(self, embedding_id: int) -> int: - """ - Remove an embedding from the database. - - Args: - embedding_id: The ID of the embedding to remove. - - Returns: - The ID of the removed embedding. - - Raises: - ValueError: If the index is not initialized. - """ - if self.index is None: - raise ValueError("Index is not initialized") - id_array = np.array([embedding_id], dtype=np.int64) - self.index.remove_ids( - faiss.IDSelectorBatch(id_array.size, faiss.swig_ptr(id_array)) - ) - return embedding_id - - def get_knn(self, embedding: List[float], k: int) -> List[tuple[float, int]]: - """ - Get k-nearest neighbors for the given embedding. - - Args: - embedding: The query embedding vector. - k: The number of nearest neighbors to return. - - Returns: - List of tuples containing similarity scores and embedding IDs. - - Raises: - ValueError: If the index is not initialized. - """ - if self.index is None: - raise ValueError("Index is not initialized") - if self.index.ntotal == 0: - return [] - k_ = min(k, self.index.ntotal) - query_vector = np.array([embedding], dtype=np.float32) - metric_type = self.similarity_metric_type.value - # Normalize the query vector if the metric type is cosine - if metric_type == "cosine": - faiss.normalize_L2(query_vector) - distances, indices = self.index.search(query_vector, k_) - # Filter out results where index is -1 (deleted embeddings) - filtered_results = [ - (distances[0][i], indices[0][i]) - for i in range(len(indices[0])) - if indices[0][i] != -1 - ] - return [ - (self.transform_similarity_score(dist, metric_type), int(idx)) - for dist, idx in filtered_results - ] - - def reset(self) -> None: - """ - Reset the vector database to empty state. - """ - if self.index is not None: - dim = self.index.d - self._init_vector_store(dim) - self.__next_embedding_id = 0 - - def _init_vector_store(self, embedding_dim: int): - """ - Initialize the FAISS index with the given embedding dimension. - - Args: - embedding_dim: The dimension of the embedding vectors. - - Raises: - ValueError: If the similarity metric type is invalid. - """ - metric_type = self.similarity_metric_type.value - match metric_type: - case "cosine": - faiss_metric = faiss.METRIC_INNER_PRODUCT - case "euclidean": - faiss_metric = faiss.METRIC_L2 - case _: - raise ValueError(f"Invalid similarity metric type: {metric_type}") - self.index = faiss.index_factory(embedding_dim, "IDMap,Flat", faiss_metric) - - def is_empty(self) -> bool: - """ - Check if the vector database is empty. - - Returns: - True if the database contains no embeddings, False otherwise. - """ - return self.index.ntotal == 0 diff --git a/vcache/vcache_core/cache/vector_db/__init__.py b/vcache/vcache_core/cache/vector_db/__init__.py new file mode 100644 index 0000000..fc736f2 --- /dev/null +++ b/vcache/vcache_core/cache/vector_db/__init__.py @@ -0,0 +1,15 @@ +from vcache.vcache_core.cache.vector_db.embedding_metadata_obj import ( + EmbeddingMetadataObj, +) +from vcache.vcache_core.cache.vector_db.strategies.hnsw_lib import HNSWLibVectorDB +from vcache.vcache_core.cache.vector_db.vector_db import ( + SimilarityMetricType, + VectorDB, +) + +__all__ = [ + "EmbeddingMetadataObj", + "VectorDB", + "HNSWLibVectorDB", + "SimilarityMetricType", +] diff --git a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/embedding_metadata_obj.py b/vcache/vcache_core/cache/vector_db/embedding_metadata_obj.py similarity index 99% rename from vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/embedding_metadata_obj.py rename to vcache/vcache_core/cache/vector_db/embedding_metadata_obj.py index 62914ad..2e45b82 100644 --- a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/embedding_metadata_obj.py +++ b/vcache/vcache_core/cache/vector_db/embedding_metadata_obj.py @@ -11,8 +11,8 @@ class EmbeddingMetadataObj: def __init__( self, - embedding_id: int, response: str, + embedding_id: int = -1, prior: np.ndarray = None, posterior: np.ndarray = None, region_reject: List[str] = None, diff --git a/vcache/vcache_core/cache/embedding_store/vector_db/strategies/hnsw_lib.py b/vcache/vcache_core/cache/vector_db/strategies/hnsw_lib.py similarity index 57% rename from vcache/vcache_core/cache/embedding_store/vector_db/strategies/hnsw_lib.py rename to vcache/vcache_core/cache/vector_db/strategies/hnsw_lib.py index 7bb62c6..f641272 100644 --- a/vcache/vcache_core/cache/embedding_store/vector_db/strategies/hnsw_lib.py +++ b/vcache/vcache_core/cache/vector_db/strategies/hnsw_lib.py @@ -1,8 +1,11 @@ -from typing import List +from typing import Dict, List import hnswlib -from vcache.vcache_core.cache.embedding_store.vector_db.vector_db import ( +from vcache.vcache_core.cache.vector_db.embedding_metadata_obj import ( + EmbeddingMetadataObj, +) +from vcache.vcache_core.cache.vector_db.vector_db import ( SimilarityMetricType, VectorDB, ) @@ -14,7 +17,7 @@ class HNSWLibVectorDB(VectorDB): """ - HNSWLib-based vector database implementation for efficient similarity search. + HNSWLib-based vector database implementation that stores both embeddings and metadata. """ def __init__( @@ -39,27 +42,36 @@ def __init__( self.M = None self.ef = None self.index = None + self.metadata_storage: Dict[int, EmbeddingMetadataObj] = {} - def add(self, embedding: List[float]) -> int: + def add(self, embedding: List[float], metadata: EmbeddingMetadataObj) -> int: """ - Add an embedding vector to the database. + Add an embedding vector and its metadata to the database. Args: embedding: The embedding vector to add. + metadata: The metadata object associated with the embedding. Returns: The unique ID assigned to the added embedding. """ if self.index is None: self._init_vector_store(len(embedding)) - self.index.add_items(embedding, self.__next_embedding_id) + + embedding_id = self.__next_embedding_id + self.index.add_items(embedding, embedding_id) + + # Automatically set the embedding_id in the metadata + metadata.embedding_id = embedding_id + self.metadata_storage[embedding_id] = metadata + self.embedding_count += 1 self.__next_embedding_id += 1 - return self.__next_embedding_id - 1 + return embedding_id def remove(self, embedding_id: int) -> int: """ - Remove an embedding from the database. + Remove an embedding and its metadata from the database. Args: embedding_id: The ID of the embedding to remove. @@ -68,11 +80,15 @@ def remove(self, embedding_id: int) -> int: The ID of the removed embedding. Raises: - ValueError: If the index is not initialized. + ValueError: If the index is not initialized or embedding not found. """ if self.index is None: raise ValueError("Index is not initialized") + if embedding_id not in self.metadata_storage: + raise ValueError(f"Embedding with ID {embedding_id} not found") + self.index.mark_deleted(embedding_id) + del self.metadata_storage[embedding_id] self.embedding_count -= 1 return embedding_id @@ -98,7 +114,57 @@ def get_knn(self, embedding: List[float], k: int) -> List[tuple[float, int]]: self.transform_similarity_score(sim, metric_type) for sim in similarities[0] ] id_list = [int(id) for id in ids[0]] - return list(zip(similarity_scores, id_list)) + + # Filter out deleted embeddings (those not in metadata_storage) + results = [] + for score, embedding_id in zip(similarity_scores, id_list): + if embedding_id in self.metadata_storage: + results.append((score, embedding_id)) + + return results + + def get_metadata(self, embedding_id: int) -> EmbeddingMetadataObj: + """ + Get metadata for a specific embedding. + + Args: + embedding_id: The ID of the embedding to get metadata for. + + Returns: + The metadata object for the embedding. + """ + if embedding_id not in self.metadata_storage: + raise ValueError(f"Metadata for embedding ID {embedding_id} not found") + return self.metadata_storage[embedding_id] + + def update_metadata( + self, embedding_id: int, metadata: EmbeddingMetadataObj + ) -> EmbeddingMetadataObj: + """ + Update metadata for a specific embedding. + + Args: + embedding_id: The ID of the embedding to update metadata for. + metadata: The new metadata object. + + Returns: + The updated metadata object. + """ + if embedding_id not in self.metadata_storage: + raise ValueError(f"Metadata for embedding ID {embedding_id} not found") + + self.metadata_storage[embedding_id] = metadata + self.metadata_storage[embedding_id].embedding_id = embedding_id + return metadata + + def get_all_embedding_metadata_objects(self) -> List[EmbeddingMetadataObj]: + """ + Get all embedding metadata objects in the database. + + Returns: + A list of all embedding metadata objects. + """ + return list(self.metadata_storage.values()) def reset(self) -> None: """ @@ -109,6 +175,7 @@ def reset(self) -> None: self._init_vector_store(self.dim) self.embedding_count = 0 self.__next_embedding_id = 0 + self.metadata_storage.clear() def _init_vector_store(self, embedding_dim: int): """ diff --git a/vcache/vcache_core/cache/embedding_store/vector_db/vector_db.py b/vcache/vcache_core/cache/vector_db/vector_db.py similarity index 61% rename from vcache/vcache_core/cache/embedding_store/vector_db/vector_db.py rename to vcache/vcache_core/cache/vector_db/vector_db.py index f28f8e2..6be68b4 100644 --- a/vcache/vcache_core/cache/embedding_store/vector_db/vector_db.py +++ b/vcache/vcache_core/cache/vector_db/vector_db.py @@ -2,6 +2,10 @@ from enum import Enum from typing import List +from vcache.vcache_core.cache.vector_db.embedding_metadata_obj import ( + EmbeddingMetadataObj, +) + class SimilarityMetricType(Enum): """ @@ -14,7 +18,7 @@ class SimilarityMetricType(Enum): class VectorDB(ABC): """ - Abstract base class for vector databases. + Abstract base class for vector databases that store both embeddings and metadata. """ def transform_similarity_score( @@ -39,12 +43,13 @@ def transform_similarity_score( raise ValueError(f"Invalid similarity metric type: {metric_type}") @abstractmethod - def add(self, embedding: List[float]) -> int: + def add(self, embedding: List[float], metadata: EmbeddingMetadataObj) -> int: """ - Add an embedding to the vector database. + Add an embedding and its metadata to the vector database. Args: embedding: The embedding to add to the vector db. + metadata: The metadata object associated with the embedding. Returns: The id of the embedding. @@ -54,7 +59,7 @@ def add(self, embedding: List[float]) -> int: @abstractmethod def remove(self, embedding_id: int) -> int: """ - Remove an embedding from the vector database. + Remove an embedding and its metadata from the vector database. Args: embedding_id: The id of the embedding to remove. @@ -78,6 +83,45 @@ def get_knn(self, embedding: List[float], k: int) -> List[tuple[float, int]]: """ pass + @abstractmethod + def get_metadata(self, embedding_id: int) -> EmbeddingMetadataObj: + """ + Get metadata for a specific embedding. + + Args: + embedding_id: The id of the embedding to get the metadata for. + + Returns: + The metadata of the embedding. + """ + pass + + @abstractmethod + def update_metadata( + self, embedding_id: int, metadata: EmbeddingMetadataObj + ) -> EmbeddingMetadataObj: + """ + Update metadata for a specific embedding. + + Args: + embedding_id: The id of the embedding to update the metadata for. + metadata: The metadata to update the embedding with. + + Returns: + The updated metadata of the embedding. + """ + pass + + @abstractmethod + def get_all_embedding_metadata_objects(self) -> List[EmbeddingMetadataObj]: + """ + Get all embedding metadata objects in the database. + + Returns: + A list of all the embedding metadata objects in the database. + """ + pass + @abstractmethod def reset(self) -> None: """ diff --git a/vcache/vcache_policy/strategies/benchmark_iid_verified.py b/vcache/vcache_policy/strategies/benchmark_iid_verified.py index 69827c1..9924932 100644 --- a/vcache/vcache_policy/strategies/benchmark_iid_verified.py +++ b/vcache/vcache_policy/strategies/benchmark_iid_verified.py @@ -7,10 +7,9 @@ from vcache.config import VCacheConfig from vcache.vcache_core.cache.cache import Cache -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import ( +from vcache.vcache_core.cache.vector_db.embedding_metadata_obj import ( EmbeddingMetadataObj, ) -from vcache.vcache_core.cache.embedding_store.embedding_store import EmbeddingStore from vcache.vcache_core.similarity_evaluator import ( SimilarityEvaluator, StringComparisonSimilarityEvaluator, @@ -53,10 +52,7 @@ def setup(self, config: VCacheConfig): self.inference_engine = config.inference_engine self.cache = Cache( embedding_engine=config.embedding_engine, - embedding_store=EmbeddingStore( - embedding_metadata_storage=config.embedding_metadata_storage, - vector_db=config.vector_db, - ), + vector_db=config.vector_db, eviction_policy=config.eviction_policy, ) diff --git a/vcache/vcache_policy/strategies/benchmark_static.py b/vcache/vcache_policy/strategies/benchmark_static.py index b192586..83e9876 100644 --- a/vcache/vcache_policy/strategies/benchmark_static.py +++ b/vcache/vcache_policy/strategies/benchmark_static.py @@ -4,7 +4,6 @@ from vcache.config import VCacheConfig from vcache.vcache_core.cache.cache import Cache -from vcache.vcache_core.cache.embedding_store.embedding_store import EmbeddingStore from vcache.vcache_policy.vcache_policy import VCachePolicy @@ -40,10 +39,7 @@ def setup(self, config: VCacheConfig): self.inference_engine = config.inference_engine self.cache = Cache( embedding_engine=config.embedding_engine, - embedding_store=EmbeddingStore( - embedding_metadata_storage=config.embedding_metadata_storage, - vector_db=config.vector_db, - ), + vector_db=config.vector_db, eviction_policy=config.eviction_policy, ) diff --git a/vcache/vcache_policy/strategies/benchmark_verified_global.py b/vcache/vcache_policy/strategies/benchmark_verified_global.py index df26f29..7d985ae 100644 --- a/vcache/vcache_policy/strategies/benchmark_verified_global.py +++ b/vcache/vcache_policy/strategies/benchmark_verified_global.py @@ -12,10 +12,9 @@ from vcache.config import VCacheConfig from vcache.inference_engine import InferenceEngine from vcache.vcache_core.cache.cache import Cache -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import ( +from vcache.vcache_core.cache.vector_db.embedding_metadata_obj import ( EmbeddingMetadataObj, ) -from vcache.vcache_core.cache.embedding_store.embedding_store import EmbeddingStore from vcache.vcache_core.similarity_evaluator import SimilarityEvaluator from vcache.vcache_policy.vcache_policy import VCachePolicy @@ -54,10 +53,7 @@ def setup(self, config: VCacheConfig): self.inference_engine = config.inference_engine self.cache = Cache( embedding_engine=config.embedding_engine, - embedding_store=EmbeddingStore( - embedding_metadata_storage=config.embedding_metadata_storage, - vector_db=config.vector_db, - ), + vector_db=config.vector_db, eviction_policy=config.eviction_policy, ) diff --git a/vcache/vcache_policy/strategies/verified.py b/vcache/vcache_policy/strategies/verified.py index ffd93e0..ecf540f 100644 --- a/vcache/vcache_policy/strategies/verified.py +++ b/vcache/vcache_policy/strategies/verified.py @@ -16,13 +16,10 @@ from vcache.config import VCacheConfig from vcache.inference_engine import InferenceEngine from vcache.vcache_core.cache.cache import Cache -from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import ( +from vcache.vcache_core.cache.vector_db.embedding_metadata_obj import ( EmbeddingMetadataObj, ) -from vcache.vcache_core.cache.embedding_store.embedding_store import EmbeddingStore -from vcache.vcache_core.similarity_evaluator import ( - SimilarityEvaluator, -) +from vcache.vcache_core.similarity_evaluator import SimilarityEvaluator from vcache.vcache_policy.vcache_policy import VCachePolicy # Disable Hugging Face tokenizer parallelism to prevent deadlocks when using @@ -144,10 +141,7 @@ def setup(self, config: VCacheConfig): self.inference_engine = config.inference_engine self.cache = Cache( embedding_engine=config.embedding_engine, - embedding_store=EmbeddingStore( - embedding_metadata_storage=config.embedding_metadata_storage, - vector_db=config.vector_db, - ), + vector_db=config.vector_db, eviction_policy=config.eviction_policy, )