diff --git a/README.md b/README.md
index 63b4f05..1f3d47d 100644
--- a/README.md
+++ b/README.md
@@ -8,25 +8,16 @@
-
Reliable and Efficient Semantic Prompt Caching
-
-
-
Semantic caching reduces LLM latency and cost by returning cached model responses for semantically similar prompts (not just exact matches). **vCache** is the first verified semantic cache that **guarantees user-defined error rate bounds**. vCache replaces static thresholds with **online-learned, embedding-specific decision boundaries**—no manual fine-tuning required. This enables reliable cached response reuse across any embedding model or workload.
-
-
> [NOTE]
> vCache is currently in active development. Features and APIs may change as we continue to improve the system.
-
-
-
## 🚀 Quick Install
Install vCache in editable mode:
@@ -40,6 +31,7 @@ Then, set your OpenAI key:
```bash
export OPENAI_API_KEY="your_api_key_here"
```
+
(Note: vCache uses OpenAI by default for both LLM inference and embedding generation, but you can configure any other backend)
Finally, use vCache in your Python code:
@@ -53,16 +45,14 @@ print(f"Response: {response}")
```
By default, vCache uses:
+
- `OpenAIInferenceEngine`
- `OpenAIEmbeddingEngine`
- `HNSWLibVectorDB`
-- `InMemoryEmbeddingMetadataStorage`
- `NoEvictionPolicy`
- `StringComparisonSimilarityEvaluator`
- `VerifiedDecisionPolicy` with a maximum failure rate of 2%
-
-
## ⚙️ Advanced Configuration
vCache is modular and highly configurable. Below is an example showing how to customize key components:
@@ -75,12 +65,12 @@ from vcache.main import VCache
from vcache.config import VCacheConfig
from vcache.inference_engine.strategies.open_ai import OpenAIInferenceEngine
from vcache.vcache_core.cache.embedding_engine.strategies.open_ai import OpenAIEmbeddingEngine
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.strategies.in_memory import InMemoryEmbeddingMetadataStorage
from vcache.vcache_core.similarity_evaluator.strategies.string_comparison import StringComparisonSimilarityEvaluator
from vcache.vcache_policy.strategies.dynamic_local_threshold import VerifiedDecisionPolicy
from vcache.vcache_policy.vcache_policy import VCachePolicy
-from vcache.vcache_core.cache.embedding_store.vector_db import HNSWLibVectorDB, SimilarityMetricType
+from vcache.vcache_core.cache.vector_db import HNSWLibVectorDB, SimilarityMetricType
```
+
```python
@@ -92,7 +82,6 @@ vcache_config: VCacheConfig = VCacheConfig(
similarity_metric_type=SimilarityMetricType.COSINE,
max_capacity=100_000,
),
- embedding_metadata_storage=InMemoryEmbeddingMetadataStorage(),
similarity_evaluator=StringComparisonSimilarityEvaluator,
)
@@ -117,16 +106,16 @@ Semantic caching reduces LLM latency and cost by returning cached model response
### Architecture Overview
1. **Embed & Store**
-Each prompt is converted to a fixed-length vector (an “embedding”) and stored in a vector database along with its LLM response.
+ Each prompt is converted to a fixed-length vector (an “embedding”) and stored in a vector database along with its LLM response.
2. **Nearest-Neighbor Lookup**
-When a new prompt arrives, the cache embeds it and finds its most similar stored prompt using a similarity metric (e.g., cosine similarity).
+ When a new prompt arrives, the cache embeds it and finds its most similar stored prompt using a similarity metric (e.g., cosine similarity).
3. **Similarity Score**
-The system computes a score between 0 and 1 that quantifies how “close” the new prompt is to the retrieved entry.
+ The system computes a score between 0 and 1 that quantifies how “close” the new prompt is to the retrieved entry.
-4. **Decision: Exploit vs. Explore**
- - **Exploit (cache hit):** If the similarity is above a confidence bound, return the cached response.
+4. **Decision: Exploit vs. Explore**
+ - **Exploit (cache hit):** If the similarity is above a confidence bound, return the cached response.
- **Explore (cache miss):** Otherwise, infer the LLM for a response, add its embedding and answer to the cache, and return it.
@@ -139,6 +128,7 @@ The system computes a score between 0 and 1 that quantifies how “close” the
### Why Fixed Thresholds Fall Short
+
Existing semantic caches rely on a **global static threshold** to decide whether to reuse a cached response (exploit) or invoke the LLM (explore). If the similarity score exceeds this threshold, the cache reuses the response; otherwise, it infers the model. This strategy is fundamentally limited.
- **Uniform threshold, diverse prompts:** A fixed threshold assumes all embeddings are equally distributed—ignoring that similarity is context-dependent.
@@ -161,11 +151,11 @@ vCache overcomes these limitations with two ideas:
### Benefits
- **Reliability**
- Formally bounds the rate of incorrect cache hits to your chosen tolerance.
+ Formally bounds the rate of incorrect cache hits to your chosen tolerance.
- **Performance**
- Matches or exceeds static-threshold systems in cache hit rate and end-to-end latency.
+ Matches or exceeds static-threshold systems in cache hit rate and end-to-end latency.
- **Simplicity**
- Plug in any embedding model; vCache learns and adapts automatically at runtime.
+ Plug in any embedding model; vCache learns and adapts automatically at runtime.
@@ -178,25 +168,24 @@ vCache overcomes these limitations with two ideas:
Please refer to the [vCache paper](https://arxiv.org/abs/2502.03771) for further details.
-
## 🛠 Developer Guide
For advanced usage and development setup, see the [Developer Guide](ReadMe_Dev.md).
-
-
## 📊 Benchmarking vCache
vCache includes a benchmarking framework to evaluate:
+
- **Cache hit rate**
- **Error rate**
- **Latency improvement**
- **...**
We provide three open benchmarks:
-- **SemCacheLmArena** (chat-style prompts) - [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkLmArena)
-- **SemCacheClassification** (classification queries) - [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkClassification)
-- **SemCacheSearchQueries** (real-world search logs) - [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkSearchQueries)
+
+- **SemCacheLmArena** (chat-style prompts) - [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkLmArena)
+- **SemCacheClassification** (classification queries) - [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkClassification)
+- **SemCacheSearchQueries** (real-world search logs) - [Dataset ↗](https://huggingface.co/datasets/vCache/SemBenchmarkSearchQueries)
See the [Benchmarking Documentation](benchmarks/ReadMe.md) for instructions.
@@ -211,4 +200,4 @@ If you use vCache for your research, please cite our [paper](https://arxiv.org/a
journal={arXiv preprint arXiv:2502.03771},
year={2025}
}
-```
\ No newline at end of file
+```
diff --git a/benchmarks/benchmark.py b/benchmarks/benchmark.py
index 34efa30..0095c82 100644
--- a/benchmarks/benchmark.py
+++ b/benchmarks/benchmark.py
@@ -23,16 +23,13 @@
from vcache.vcache_core.cache.embedding_engine.strategies.benchmark import (
BenchmarkEmbeddingEngine,
)
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage import (
- InMemoryEmbeddingMetadataStorage,
-)
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import (
- EmbeddingMetadataObj,
-)
-from vcache.vcache_core.cache.embedding_store.vector_db import (
+from vcache.vcache_core.cache.vector_db import (
HNSWLibVectorDB,
SimilarityMetricType,
)
+from vcache.vcache_core.cache.vector_db.embedding_metadata_obj import (
+ EmbeddingMetadataObj,
+)
from vcache.vcache_core.similarity_evaluator import SimilarityEvaluator
from vcache.vcache_core.similarity_evaluator.strategies.llm_comparison import (
LLMComparisonSimilarityEvaluator,
@@ -396,7 +393,7 @@ def dump_results_to_json(self):
var_ts_dict = {}
metadata_objects: List[EmbeddingMetadataObj] = (
- self.vcache.vcache_config.embedding_metadata_storage.get_all_embedding_metadata_objects()
+ self.vcache.vcache_config.vector_db.get_all_embedding_metadata_objects()
)
for metadata_object in metadata_objects:
@@ -486,7 +483,6 @@ def __run_baseline(
similarity_metric_type=SimilarityMetricType.COSINE,
max_capacity=MAX_VECTOR_DB_CAPACITY,
),
- embedding_metadata_storage=InMemoryEmbeddingMetadataStorage(),
similarity_evaluator=similarity_evaluator,
)
vcache: VCache = VCache(vcache_config, vcache_policy)
diff --git a/test.py b/test.py
index 62b2da7..bf042b6 100644
--- a/test.py
+++ b/test.py
@@ -4,10 +4,7 @@
from vcache.vcache_core.cache.embedding_engine.strategies.open_ai import (
OpenAIEmbeddingEngine,
)
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.strategies.in_memory import (
- InMemoryEmbeddingMetadataStorage,
-)
-from vcache.vcache_core.cache.embedding_store.vector_db import (
+from vcache.vcache_core.cache.vector_db import (
HNSWLibVectorDB,
SimilarityMetricType,
)
@@ -27,7 +24,6 @@
similarity_metric_type=SimilarityMetricType.COSINE,
max_capacity=100000,
),
- embedding_metadata_storage=InMemoryEmbeddingMetadataStorage(),
similarity_evaluator=StringComparisonSimilarityEvaluator,
)
vcache: VCache = VCache(vcache_config, vcache_policy)
diff --git a/tests/integration/test_concurrency.py b/tests/integration/test_concurrency.py
index 4174303..6a1a714 100644
--- a/tests/integration/test_concurrency.py
+++ b/tests/integration/test_concurrency.py
@@ -8,7 +8,6 @@
from vcache import (
HNSWLibVectorDB,
- InMemoryEmbeddingMetadataStorage,
LangChainEmbeddingEngine,
StringComparisonSimilarityEvaluator,
VCache,
@@ -46,7 +45,6 @@ def answers_similar(a, b):
model_name="sentence-transformers/all-mpnet-base-v2"
),
vector_db=HNSWLibVectorDB(),
- embedding_metadata_storage=InMemoryEmbeddingMetadataStorage(),
similarity_evaluator=similarity_evaluator,
)
@@ -93,7 +91,9 @@ def do_inference(prompt):
time.sleep(1.5)
executor.map(do_inference, concurrent_prompts_chunk_2)
- all_metadata_objects = vcache.vcache_config.embedding_metadata_storage.get_all_embedding_metadata_objects()
+ all_metadata_objects = (
+ vcache.vcache_config.vector_db.get_all_embedding_metadata_objects()
+ )
final_observation_count = len(all_metadata_objects)
for i, metadata_object in enumerate(all_metadata_objects):
diff --git a/tests/integration/test_dynamic_threshold.py b/tests/integration/test_dynamic_threshold.py
index 4d4d919..b04d147 100644
--- a/tests/integration/test_dynamic_threshold.py
+++ b/tests/integration/test_dynamic_threshold.py
@@ -4,7 +4,6 @@
from vcache import (
HNSWLibVectorDB,
- InMemoryEmbeddingMetadataStorage,
LangChainEmbeddingEngine,
OpenAIInferenceEngine,
VCache,
@@ -25,7 +24,6 @@ def create_default_config_and_policy():
model_name="sentence-transformers/all-mpnet-base-v2"
),
vector_db=HNSWLibVectorDB(),
- embedding_metadata_storage=InMemoryEmbeddingMetadataStorage(),
system_prompt="Please answer in a single word with the first letter capitalized. Example: London",
)
policy = VerifiedDecisionPolicy(delta=0.05)
diff --git a/tests/integration/test_static_threshold.py b/tests/integration/test_static_threshold.py
index d75300e..96bdd5e 100644
--- a/tests/integration/test_static_threshold.py
+++ b/tests/integration/test_static_threshold.py
@@ -5,7 +5,6 @@
from vcache import (
BenchmarkStaticDecisionPolicy,
HNSWLibVectorDB,
- InMemoryEmbeddingMetadataStorage,
LangChainEmbeddingEngine,
OpenAIInferenceEngine,
VCache,
@@ -25,7 +24,6 @@ def create_default_config_and_policy():
model_name="sentence-transformers/all-mpnet-base-v2"
),
vector_db=HNSWLibVectorDB(),
- embedding_metadata_storage=InMemoryEmbeddingMetadataStorage(),
)
policy = BenchmarkStaticDecisionPolicy(threshold=0.8)
return config, policy
diff --git a/tests/unit/EmbeddingMetadataStrategy/test_embedding_metadata.py b/tests/unit/EmbeddingMetadataStrategy/test_embedding_metadata.py
deleted file mode 100644
index e266159..0000000
--- a/tests/unit/EmbeddingMetadataStrategy/test_embedding_metadata.py
+++ /dev/null
@@ -1,32 +0,0 @@
-import unittest
-
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage import (
- InMemoryEmbeddingMetadataStorage,
-)
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import (
- EmbeddingMetadataObj,
-)
-
-
-class TestEmbeddingMetadataStorageStrategy(unittest.TestCase):
- def test_in_memory_strategy(self):
- embedding_metadata_storage = InMemoryEmbeddingMetadataStorage()
-
- initial_obj = EmbeddingMetadataObj(embedding_id=0, response="test")
- embedding_id = embedding_metadata_storage.add_metadata(
- embedding_id=0, metadata=initial_obj
- )
- assert embedding_id == 0
- assert embedding_metadata_storage.get_metadata(embedding_id=0) == initial_obj
-
- updated_obj = EmbeddingMetadataObj(embedding_id=0, response="test2")
- embedding_metadata_storage.update_metadata(embedding_id=0, metadata=updated_obj)
- assert embedding_metadata_storage.get_metadata(embedding_id=0) == updated_obj
-
- embedding_metadata_storage.flush()
- with self.assertRaises(ValueError):
- embedding_metadata_storage.get_metadata(embedding_id=0)
-
-
-if __name__ == "__main__":
- unittest.main()
diff --git a/tests/unit/VCachePolicyStrategy/test_vcache_policy.py b/tests/unit/VCachePolicyStrategy/test_vcache_policy.py
index 50bdcbc..01d24c1 100644
--- a/tests/unit/VCachePolicyStrategy/test_vcache_policy.py
+++ b/tests/unit/VCachePolicyStrategy/test_vcache_policy.py
@@ -3,7 +3,7 @@
from unittest.mock import MagicMock, patch
from vcache.config import VCacheConfig
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import (
+from vcache.vcache_core.cache.vector_db import (
EmbeddingMetadataObj,
)
from vcache.vcache_policy.strategies.verified import (
@@ -48,11 +48,9 @@ def update_metadata(embedding_id, embedding_metadata):
mock_config = MagicMock(spec=VCacheConfig)
mock_config.inference_engine = self.mock_inference_engine
mock_config.similarity_evaluator = self.mock_similarity_evaluator
- # Add all required attributes for Cache creation
- mock_config.embedding_engine = MagicMock()
- mock_config.embedding_metadata_storage = MagicMock()
mock_config.vector_db = MagicMock()
mock_config.eviction_policy = MagicMock()
+ mock_config.embedding_engine = MagicMock()
self.policy = VerifiedDecisionPolicy()
self.policy.setup(mock_config)
diff --git a/tests/unit/VectorDBStrategy/test_vector_db.py b/tests/unit/VectorDBStrategy/test_vector_db.py
index 2ca1de9..d5b3a2c 100644
--- a/tests/unit/VectorDBStrategy/test_vector_db.py
+++ b/tests/unit/VectorDBStrategy/test_vector_db.py
@@ -2,9 +2,8 @@
import pytest
-from vcache.vcache_core.cache.embedding_store.vector_db import (
- ChromaVectorDB,
- FAISSVectorDB,
+from vcache.vcache_core.cache.vector_db import (
+ EmbeddingMetadataObj,
HNSWLibVectorDB,
SimilarityMetricType,
)
@@ -12,10 +11,6 @@
VECTOR_DB_PARAMS = [
(HNSWLibVectorDB, SimilarityMetricType.COSINE),
(HNSWLibVectorDB, SimilarityMetricType.EUCLIDEAN),
- (FAISSVectorDB, SimilarityMetricType.COSINE),
- (FAISSVectorDB, SimilarityMetricType.EUCLIDEAN),
- (ChromaVectorDB, SimilarityMetricType.COSINE),
- (ChromaVectorDB, SimilarityMetricType.EUCLIDEAN),
]
@@ -32,15 +27,19 @@ def test_add_and_get_knn(self, vector_db_class, similarity_metric_type):
# Test with a single embedding
embedding = [0.1, 0.2, 0.3]
- id1 = vector_db.add(embedding=embedding)
+ metadata1 = EmbeddingMetadataObj(response="test response 1")
+ id1 = vector_db.add(embedding=embedding, metadata=metadata1)
+
knn = vector_db.get_knn(embedding=embedding, k=1)
assert len(knn) == 1
assert abs(knn[0][0] - 1.0) < 1e-5 # Should be a perfect match
assert knn[0][1] == id1
# Test with multiple embeddings
- vector_db.add(embedding=[0.2, 0.3, 0.4])
- vector_db.add(embedding=[0.3, 0.4, 0.5])
+ metadata2 = EmbeddingMetadataObj(response="test response 2")
+ metadata3 = EmbeddingMetadataObj(response="test response 3")
+ vector_db.add(embedding=[0.2, 0.3, 0.4], metadata=metadata2)
+ vector_db.add(embedding=[0.3, 0.4, 0.5], metadata=metadata3)
# Verify we get all embeddings when k is large enough
knn = vector_db.get_knn(embedding=embedding, k=3)
@@ -59,8 +58,10 @@ def test_remove(self, vector_db_class, similarity_metric_type):
vector_db = vector_db_class(similarity_metric_type=similarity_metric_type)
# Add multiple embeddings
- id1 = vector_db.add(embedding=[0.1, 0.2, 0.3])
- id2 = vector_db.add(embedding=[0.2, 0.3, 0.4])
+ metadata1 = EmbeddingMetadataObj(response="test response 1")
+ metadata2 = EmbeddingMetadataObj(response="test response 2")
+ id1 = vector_db.add(embedding=[0.1, 0.2, 0.3], metadata=metadata1)
+ id2 = vector_db.add(embedding=[0.2, 0.3, 0.4], metadata=metadata2)
# Verify both exist
knn = vector_db.get_knn(embedding=[0.1, 0.2, 0.3], k=2)
@@ -83,9 +84,12 @@ def test_reset(self, vector_db_class, similarity_metric_type):
vector_db = vector_db_class(similarity_metric_type=similarity_metric_type)
# Add multiple embeddings
- vector_db.add(embedding=[0.1, 0.2, 0.3])
- vector_db.add(embedding=[0.2, 0.3, 0.4])
- vector_db.add(embedding=[0.3, 0.4, 0.5])
+ metadata1 = EmbeddingMetadataObj(response="test response 1")
+ metadata2 = EmbeddingMetadataObj(response="test response 2")
+ metadata3 = EmbeddingMetadataObj(response="test response 3")
+ vector_db.add(embedding=[0.1, 0.2, 0.3], metadata=metadata1)
+ vector_db.add(embedding=[0.2, 0.3, 0.4], metadata=metadata2)
+ vector_db.add(embedding=[0.3, 0.4, 0.5], metadata=metadata3)
# Verify embeddings exist
knn = vector_db.get_knn(embedding=[0.1, 0.2, 0.3], k=3)
@@ -98,6 +102,37 @@ def test_reset(self, vector_db_class, similarity_metric_type):
knn = vector_db.get_knn(embedding=[0.1, 0.2, 0.3], k=3)
assert len(knn) == 0
+ @pytest.mark.parametrize(
+ "vector_db_class, similarity_metric_type",
+ VECTOR_DB_PARAMS,
+ )
+ def test_metadata_operations(self, vector_db_class, similarity_metric_type):
+ """Test metadata operations of the vector database."""
+ vector_db = vector_db_class(similarity_metric_type=similarity_metric_type)
+
+ # Add embedding with metadata (embedding_id will be set automatically)
+ metadata = EmbeddingMetadataObj(response="test response")
+ embedding_id = vector_db.add(embedding=[0.1, 0.2, 0.3], metadata=metadata)
+
+ # Test get metadata
+ retrieved_metadata = vector_db.get_metadata(embedding_id)
+ assert retrieved_metadata.response == "test response"
+ assert (
+ retrieved_metadata.embedding_id == embedding_id
+ ) # Should be set automatically
+
+ # Test update metadata
+ updated_metadata = EmbeddingMetadataObj(response="updated response")
+ vector_db.update_metadata(embedding_id, updated_metadata)
+
+ retrieved_metadata = vector_db.get_metadata(embedding_id)
+ assert retrieved_metadata.response == "updated response"
+
+ # Test get all metadata objects
+ all_metadata = vector_db.get_all_embedding_metadata_objects()
+ assert len(all_metadata) == 1
+ assert all_metadata[0].response == "updated response"
+
if __name__ == "__main__":
unittest.main()
diff --git a/vcache/__init__.py b/vcache/__init__.py
index e5140a1..8e989b4 100644
--- a/vcache/__init__.py
+++ b/vcache/__init__.py
@@ -20,27 +20,19 @@
OpenAIEmbeddingEngine,
)
-# Embedding metadata storage
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage import (
- EmbeddingMetadataStorage,
- InMemoryEmbeddingMetadataStorage,
+# Eviction policies
+from vcache.vcache_core.cache.eviction_policy import (
+ EvictionPolicy,
+ LRUEvictionPolicy,
)
# Vector databases
-from vcache.vcache_core.cache.embedding_store.vector_db import (
- ChromaVectorDB,
- FAISSVectorDB,
+from vcache.vcache_core.cache.vector_db import (
HNSWLibVectorDB,
SimilarityMetricType,
VectorDB,
)
-# Eviction policies
-from vcache.vcache_core.cache.eviction_policy import (
- EvictionPolicy,
- LRUEvictionPolicy,
-)
-
# Similarity evaluators
from vcache.vcache_core.similarity_evaluator import (
SimilarityEvaluator,
@@ -71,9 +63,7 @@
"LangChainEmbeddingEngine",
# Vector databases
"VectorDB",
- "FAISSVectorDB",
"HNSWLibVectorDB",
- "ChromaVectorDB",
"SimilarityMetricType",
# Similarity evaluators
"SimilarityEvaluator",
@@ -81,9 +71,6 @@
# Eviction policies
"EvictionPolicy",
"LRUEvictionPolicy",
- # Embedding metadata storage
- "EmbeddingMetadataStorage",
- "InMemoryEmbeddingMetadataStorage",
# vCache Policies
"VCachePolicy",
"VerifiedDecisionPolicy",
diff --git a/vcache/config.py b/vcache/config.py
index f337648..d284920 100644
--- a/vcache/config.py
+++ b/vcache/config.py
@@ -4,20 +4,14 @@
from vcache.inference_engine.strategies.open_ai import OpenAIInferenceEngine
from vcache.vcache_core.cache.embedding_engine import OpenAIEmbeddingEngine
from vcache.vcache_core.cache.embedding_engine.embedding_engine import EmbeddingEngine
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_storage import (
- EmbeddingMetadataStorage,
-)
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.strategies.in_memory import (
- InMemoryEmbeddingMetadataStorage,
-)
-from vcache.vcache_core.cache.embedding_store.vector_db import VectorDB
-from vcache.vcache_core.cache.embedding_store.vector_db.strategies.hnsw_lib import (
- HNSWLibVectorDB,
-)
from vcache.vcache_core.cache.eviction_policy.eviction_policy import EvictionPolicy
from vcache.vcache_core.cache.eviction_policy.strategies.no_eviction import (
NoEvictionPolicy,
)
+from vcache.vcache_core.cache.vector_db.strategies.hnsw_lib import (
+ HNSWLibVectorDB,
+)
+from vcache.vcache_core.cache.vector_db.vector_db import VectorDB
from vcache.vcache_core.similarity_evaluator.similarity_evaluator import (
SimilarityEvaluator,
)
@@ -36,7 +30,6 @@ def __init__(
inference_engine: InferenceEngine = OpenAIInferenceEngine(),
embedding_engine: EmbeddingEngine = OpenAIEmbeddingEngine(),
vector_db: VectorDB = HNSWLibVectorDB(),
- embedding_metadata_storage: EmbeddingMetadataStorage = InMemoryEmbeddingMetadataStorage(),
eviction_policy: EvictionPolicy = NoEvictionPolicy(),
similarity_evaluator: SimilarityEvaluator = StringComparisonSimilarityEvaluator(),
system_prompt: Optional[str] = None,
@@ -47,8 +40,7 @@ def __init__(
Args:
inference_engine: Engine for generating responses from prompts.
embedding_engine: Engine for generating embeddings from text.
- vector_db: Vector database for storing and retrieving embeddings.
- embedding_metadata_storage: Storage for embedding metadata.
+ vector_db: Vector database for storing embeddings and metadata.
eviction_policy: Policy for removing items from cache when full.
similarity_evaluator: Evaluator for determining similarity between prompts.
system_prompt: Optional system prompt to use for all inferences.
@@ -57,7 +49,6 @@ def __init__(
self.embedding_engine = embedding_engine
self.vector_db = vector_db
self.eviction_policy = eviction_policy
- self.embedding_metadata_storage = embedding_metadata_storage
self.similarity_evaluator = similarity_evaluator
self.similarity_evaluator.set_inference_engine(self.inference_engine)
self.system_prompt = system_prompt
diff --git a/vcache/vcache_core/cache/__init__.py b/vcache/vcache_core/cache/__init__.py
index 861d231..7882f73 100644
--- a/vcache/vcache_core/cache/__init__.py
+++ b/vcache/vcache_core/cache/__init__.py
@@ -1,21 +1,15 @@
from vcache.vcache_core.cache.cache import Cache
from vcache.vcache_core.cache.embedding_engine.embedding_engine import EmbeddingEngine
-from vcache.vcache_core.cache.embedding_store import EmbeddingStore
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import (
- EmbeddingMetadataObj,
-)
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_storage import (
- EmbeddingMetadataStorage,
-)
from vcache.vcache_core.cache.eviction_policy.eviction_policy import EvictionPolicy
from vcache.vcache_core.cache.eviction_policy.system_monitor import SystemMonitor
+from vcache.vcache_core.cache.vector_db.embedding_metadata_obj import (
+ EmbeddingMetadataObj,
+)
__all__ = [
"Cache",
- "EmbeddingStore",
"EmbeddingMetadataObj",
"EmbeddingEngine",
- "EmbeddingMetadataStorage",
"EvictionPolicy",
"SystemMonitor",
]
diff --git a/vcache/vcache_core/cache/cache.py b/vcache/vcache_core/cache/cache.py
index 60adf17..7e29a4e 100644
--- a/vcache/vcache_core/cache/cache.py
+++ b/vcache/vcache_core/cache/cache.py
@@ -1,11 +1,11 @@
from typing import List
from vcache.vcache_core.cache.embedding_engine.embedding_engine import EmbeddingEngine
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import (
+from vcache.vcache_core.cache.eviction_policy.eviction_policy import EvictionPolicy
+from vcache.vcache_core.cache.vector_db.embedding_metadata_obj import (
EmbeddingMetadataObj,
)
-from vcache.vcache_core.cache.embedding_store.embedding_store import EmbeddingStore
-from vcache.vcache_core.cache.eviction_policy.eviction_policy import EvictionPolicy
+from vcache.vcache_core.cache.vector_db.vector_db import VectorDB
class Cache:
@@ -15,29 +15,25 @@ class Cache:
def __init__(
self,
- embedding_store: EmbeddingStore,
+ vector_db: VectorDB,
embedding_engine: EmbeddingEngine,
eviction_policy: EvictionPolicy,
):
"""
- Initialize cache with embedding store, engine, and eviction policy.
+ Initialize cache with vector database, engine, and eviction policy.
Args:
- embedding_store: Store for managing embeddings and metadata.
+ vector_db: Vector database for managing embeddings and metadata.
embedding_engine: Engine for generating embeddings from text.
eviction_policy: Policy for removing items when cache is full.
"""
- self.embedding_store = embedding_store
+ self.vector_db = vector_db
self.embedding_engine = embedding_engine
self.eviction_policy = eviction_policy
def add(self, prompt: str, response: str) -> int:
"""
- Compute the embedding for the prompt, add an embedding to the vector database and a new metadata object.
-
- IMPORTANT: The embedding is computed first and then added to the vector database.
- The metadata object is added last.
- Consider this when implementing asynchronous logic to prevent race conditions.
+ Compute the embedding for the prompt, add an embedding to the vector database with metadata.
Args:
prompt: The prompt to add to the cache.
@@ -47,7 +43,9 @@ def add(self, prompt: str, response: str) -> int:
The id of the embedding.
"""
embedding = self.embedding_engine.get_embedding(prompt)
- return self.embedding_store.add_embedding(embedding, response)
+ metadata = EmbeddingMetadataObj(response=response)
+ embedding_id = self.vector_db.add(embedding, metadata)
+ return embedding_id
def remove(self, embedding_id: int) -> int:
"""
@@ -59,7 +57,7 @@ def remove(self, embedding_id: int) -> int:
Returns:
The id of the embedding.
"""
- return self.embedding_store.remove(embedding_id)
+ return self.vector_db.remove(embedding_id)
def get_knn(self, prompt: str, k: int) -> List[tuple[float, int]]:
"""
@@ -73,13 +71,13 @@ def get_knn(self, prompt: str, k: int) -> List[tuple[float, int]]:
A list of tuples, each containing a similarity score and an embedding id.
"""
embedding = self.embedding_engine.get_embedding(prompt)
- return self.embedding_store.get_knn(embedding, k)
+ return self.vector_db.get_knn(embedding, k)
def flush(self) -> None:
"""
Flush all data from the cache.
"""
- self.embedding_store.reset()
+ self.vector_db.reset()
def get_metadata(self, embedding_id: int) -> EmbeddingMetadataObj:
"""
@@ -91,7 +89,7 @@ def get_metadata(self, embedding_id: int) -> EmbeddingMetadataObj:
Returns:
The metadata of the embedding.
"""
- return self.embedding_store.get_metadata(embedding_id)
+ return self.vector_db.get_metadata(embedding_id)
def update_metadata(
self, embedding_id: int, embedding_metadata: EmbeddingMetadataObj
@@ -106,7 +104,7 @@ def update_metadata(
Returns:
The updated metadata of the embedding.
"""
- return self.embedding_store.update_metadata(embedding_id, embedding_metadata)
+ return self.vector_db.update_metadata(embedding_id, embedding_metadata)
def get_current_capacity(self) -> int:
"""
@@ -125,7 +123,7 @@ def is_empty(self) -> bool:
Returns:
True if the cache is empty, False otherwise.
"""
- return self.embedding_store.is_empty()
+ return self.vector_db.is_empty()
def get_all_embedding_metadata_objects(self) -> List[EmbeddingMetadataObj]:
"""
@@ -134,4 +132,4 @@ def get_all_embedding_metadata_objects(self) -> List[EmbeddingMetadataObj]:
Returns:
A list of all the embedding metadata objects in the cache.
"""
- return self.embedding_store.embedding_metadata_storage.get_all_embedding_metadata_objects()
+ return self.vector_db.get_all_embedding_metadata_objects()
diff --git a/vcache/vcache_core/cache/embedding_store/__init__.py b/vcache/vcache_core/cache/embedding_store/__init__.py
deleted file mode 100644
index febd930..0000000
--- a/vcache/vcache_core/cache/embedding_store/__init__.py
+++ /dev/null
@@ -1,11 +0,0 @@
-from vcache.vcache_core.cache.embedding_store.embedding_store import EmbeddingStore
-from vcache.vcache_core.cache.embedding_store.vector_db import (
- SimilarityMetricType,
- VectorDB,
-)
-
-__all__ = [
- "VectorDB",
- "EmbeddingStore",
- "SimilarityMetricType",
-]
diff --git a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/__init__.py b/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/__init__.py
deleted file mode 100644
index 77692de..0000000
--- a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/__init__.py
+++ /dev/null
@@ -1,8 +0,0 @@
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_storage import (
- EmbeddingMetadataStorage,
-)
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.strategies.in_memory import (
- InMemoryEmbeddingMetadataStorage,
-)
-
-__all__ = ["EmbeddingMetadataStorage", "InMemoryEmbeddingMetadataStorage"]
diff --git a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/embedding_metadata_storage.py b/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/embedding_metadata_storage.py
deleted file mode 100644
index 0c7fcbd..0000000
--- a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/embedding_metadata_storage.py
+++ /dev/null
@@ -1,82 +0,0 @@
-from abc import ABC, abstractmethod
-from typing import List
-
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import (
- EmbeddingMetadataObj,
-)
-
-
-class EmbeddingMetadataStorage(ABC):
- """
- Abstract base class for embedding metadata storage.
- """
-
- @abstractmethod
- def add_metadata(self, embedding_id: int, metadata: EmbeddingMetadataObj) -> int:
- """
- Add metadata for a specific embedding.
-
- Args:
- embedding_id: The id of the embedding to add the metadata for.
- metadata: The metadata to add to the embedding.
-
- Returns:
- The id of the embedding.
- """
- pass
-
- @abstractmethod
- def get_metadata(self, embedding_id: int) -> EmbeddingMetadataObj:
- """
- Get metadata for a specific embedding.
-
- Args:
- embedding_id: The id of the embedding to get the metadata for.
-
- Returns:
- The metadata of the embedding.
- """
- pass
-
- @abstractmethod
- def update_metadata(
- self, embedding_id: int, metadata: EmbeddingMetadataObj
- ) -> EmbeddingMetadataObj:
- """
- Update metadata for a specific embedding.
-
- Args:
- embedding_id: The id of the embedding to update the metadata for.
- metadata: The metadata to update the embedding with.
-
- Returns:
- The updated metadata of the embedding.
- """
- pass
-
- @abstractmethod
- def remove_metadata(self, embedding_id: int) -> None:
- """
- Remove metadata for a specific embedding.
-
- Args:
- embedding_id: The id of the embedding to remove the metadata for.
- """
- pass
-
- @abstractmethod
- def flush(self) -> None:
- """
- Flush all metadata from storage.
- """
- pass
-
- @abstractmethod
- def get_all_embedding_metadata_objects(self) -> List[EmbeddingMetadataObj]:
- """
- Get all embedding metadata objects in storage.
-
- Returns:
- A list of all the embedding metadata objects in the storage.
- """
- pass
diff --git a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/strategies/__init__.py b/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/strategies/__init__.py
deleted file mode 100644
index e69de29..0000000
diff --git a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/strategies/in_memory.py b/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/strategies/in_memory.py
deleted file mode 100644
index 3e49c8e..0000000
--- a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/strategies/in_memory.py
+++ /dev/null
@@ -1,110 +0,0 @@
-from typing import Any, Dict, List, Optional
-
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import (
- EmbeddingMetadataObj,
-)
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_storage import (
- EmbeddingMetadataStorage,
-)
-
-
-class InMemoryEmbeddingMetadataStorage(EmbeddingMetadataStorage):
- """
- In-memory implementation of embedding metadata storage.
- """
-
- def __init__(self):
- """
- Initialize in-memory embedding metadata storage.
- """
- self.metadata_storage: Dict[int, "EmbeddingMetadataObj"] = {}
-
- def add_metadata(
- self, embedding_id: int, metadata: Optional[Dict[str, Any]] = None
- ) -> None:
- """
- Add metadata for a specific embedding.
-
- Args:
- embedding_id: The ID of the embedding to add metadata for.
- metadata: The metadata to add.
-
- Returns:
- The embedding ID.
- """
- self.metadata_storage[embedding_id] = metadata
- return embedding_id
-
- def get_metadata(self, embedding_id: int) -> Optional[Dict[str, Any]]:
- """
- Get metadata for a specific embedding.
-
- Args:
- embedding_id: The ID of the embedding to get metadata for.
-
- Returns:
- The metadata for the embedding.
-
- Raises:
- ValueError: If embedding metadata is not found.
- """
- if embedding_id not in self.metadata_storage:
- raise ValueError(
- f"Embedding metadata for embedding id {embedding_id} not found"
- )
- else:
- return self.metadata_storage[embedding_id]
-
- def update_metadata(
- self, embedding_id: int, metadata: Optional[Dict[str, Any]] = None
- ) -> bool:
- """
- Update metadata for a specific embedding.
-
- Args:
- embedding_id: The ID of the embedding to update metadata for.
- metadata: The new metadata to set.
-
- Returns:
- The updated metadata.
-
- Raises:
- ValueError: If embedding metadata is not found.
- """
- if embedding_id not in self.metadata_storage:
- raise ValueError(
- f"Embedding metadata for embedding id {embedding_id} not found"
- )
- else:
- self.metadata_storage[embedding_id] = metadata
- return metadata
-
- def remove_metadata(self, embedding_id: int) -> bool:
- """
- Remove metadata for a specific embedding.
-
- Args:
- embedding_id: The ID of the embedding to remove metadata for.
-
- Returns:
- True if metadata was removed, False if not found.
- """
- if embedding_id in self.metadata_storage:
- del self.metadata_storage[embedding_id]
- return True
- return False
-
- def flush(self) -> None:
- """
- Flush all metadata from storage.
- """
- self.metadata_storage = {}
-
- def get_all_embedding_metadata_objects(self) -> List[EmbeddingMetadataObj]:
- """
- Get all embedding metadata objects in storage.
-
- Returns:
- A list of all embedding metadata objects.
- """
- return list(self.metadata_storage.values())
diff --git a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/strategies/langchain.py b/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/strategies/langchain.py
deleted file mode 100644
index 10025fb..0000000
--- a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/strategies/langchain.py
+++ /dev/null
@@ -1,93 +0,0 @@
-from typing import Any, Dict, List, Optional
-
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import (
- EmbeddingMetadataObj,
-)
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_storage import (
- EmbeddingMetadataStorage,
-)
-
-
-class LangchainMetadataStorage(EmbeddingMetadataStorage):
- """
- LangChain-based metadata storage implementation (placeholder).
- """
-
- def __init__(self):
- """
- Initialize LangChain metadata storage.
- """
- # TODO
- pass
-
- def add_metadata(
- self, embedding_id: int, metadata: Optional[Dict[str, Any]] = None
- ) -> None:
- """
- Add metadata for an embedding.
-
- Args:
- embedding_id: The ID of the embedding.
- metadata: The metadata to add.
- """
- # TODO
- pass
-
- def get_metadata(self, embedding_id: int) -> Optional[Dict[str, Any]]:
- """
- Get metadata for an embedding.
-
- Args:
- embedding_id: The ID of the embedding.
-
- Returns:
- The metadata for the embedding, or None if not found.
- """
- # TODO
- pass
-
- def update_metadata(
- self, embedding_id: int, metadata: Optional[Dict[str, Any]] = None
- ) -> bool:
- """
- Update metadata for an embedding.
-
- Args:
- embedding_id: The ID of the embedding.
- metadata: The new metadata.
-
- Returns:
- True if the update was successful, False otherwise.
- """
- # TODO
- pass
-
- def remove_metadata(self, embedding_id: int) -> bool:
- """
- Remove metadata for an embedding.
-
- Args:
- embedding_id: The ID of the embedding.
-
- Returns:
- True if the removal was successful, False otherwise.
- """
- # TODO
- pass
-
- def flush(self) -> None:
- """
- Flush any pending changes to storage.
- """
- # TODO
- pass
-
- def get_all_embedding_metadata_objects(self) -> List[EmbeddingMetadataObj]:
- """
- Get all embedding metadata objects.
-
- Returns:
- List of all embedding metadata objects.
- """
- # TODO
- pass
diff --git a/vcache/vcache_core/cache/embedding_store/embedding_store.py b/vcache/vcache_core/cache/embedding_store/embedding_store.py
deleted file mode 100644
index c36d10e..0000000
--- a/vcache/vcache_core/cache/embedding_store/embedding_store.py
+++ /dev/null
@@ -1,139 +0,0 @@
-import threading
-from typing import List
-
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage import (
- EmbeddingMetadataStorage,
-)
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import (
- EmbeddingMetadataObj,
-)
-from vcache.vcache_core.cache.embedding_store.vector_db.vector_db import VectorDB
-
-
-class EmbeddingStore:
- """
- Store for managing embeddings and their associated metadata.
- """
-
- def __init__(
- self,
- vector_db: VectorDB,
- embedding_metadata_storage: EmbeddingMetadataStorage,
- ):
- """
- Initialize embedding store with vector database and metadata storage.
-
- Args:
- vector_db: Vector database for storing embeddings.
- embedding_metadata_storage: Storage for embedding metadata.
- """
- self.vector_db = vector_db
- self.embedding_metadata_storage = embedding_metadata_storage
- self._add_lock = threading.Lock()
- self._remove_lock = threading.Lock()
-
- def add_embedding(self, embedding: List[float], response: str) -> int:
- """
- Add an embedding to the vector database and a new metadata object.
-
- This operation is thread-safe.
-
- Args:
- embedding: The embedding vector to add.
- response: The response associated with the embedding.
-
- Returns:
- The ID of the added embedding.
- """
- with self._add_lock:
- embedding_id = self.vector_db.add(embedding)
- metadata = EmbeddingMetadataObj(
- embedding_id=embedding_id,
- response=response,
- )
- self.embedding_metadata_storage.add_metadata(
- embedding_id=embedding_id, metadata=metadata
- )
- return embedding_id
-
- def remove(self, embedding_id: int) -> int:
- """
- Remove an embedding and its metadata from the store.
-
- This operation is thread-safe.
-
- Args:
- embedding_id: The ID of the embedding to remove.
-
- Returns:
- The ID of the removed embedding.
- """
- with self._remove_lock:
- self.embedding_metadata_storage.remove_metadata(embedding_id)
- return self.vector_db.remove(embedding_id)
-
- def get_knn(self, embedding: List[float], k: int) -> List[tuple[float, int]]:
- """
- Get k-nearest neighbors for the given embedding.
-
- Args:
- embedding: The embedding to find neighbors for.
- k: The number of neighbors to return.
-
- Returns:
- List of tuples containing similarity scores and embedding IDs.
- """
- return self.vector_db.get_knn(embedding, k)
-
- def reset(self) -> None:
- """
- Reset the embedding store to empty state.
- """
- self.embedding_metadata_storage.flush()
- return self.vector_db.reset()
-
- def calculate_storage_consumption(self) -> int:
- """
- Calculate the storage consumption of the embedding store.
-
- Returns:
- The storage consumption in bytes.
- """
- # TODO: Add metadata logic
- return -1
-
- def get_metadata(self, embedding_id: int) -> "EmbeddingMetadataObj":
- """
- Get metadata for a specific embedding.
-
- Args:
- embedding_id: The ID of the embedding.
-
- Returns:
- The metadata object for the embedding.
- """
- return self.embedding_metadata_storage.get_metadata(embedding_id)
-
- def update_metadata(
- self, embedding_id: int, metadata: "EmbeddingMetadataObj"
- ) -> "EmbeddingMetadataObj":
- """
- Update metadata for a specific embedding.
-
- Args:
- embedding_id: The ID of the embedding.
- metadata: The new metadata object.
-
- Returns:
- The updated metadata object.
- """
- return self.embedding_metadata_storage.update_metadata(embedding_id, metadata)
-
- def is_empty(self) -> bool:
- """
- Check if the embedding store is empty.
-
- Returns:
- True if the store is empty, False otherwise.
- """
- return self.vector_db.is_empty()
diff --git a/vcache/vcache_core/cache/embedding_store/vector_db/__init__.py b/vcache/vcache_core/cache/embedding_store/vector_db/__init__.py
deleted file mode 100644
index bb4a6c3..0000000
--- a/vcache/vcache_core/cache/embedding_store/vector_db/__init__.py
+++ /dev/null
@@ -1,21 +0,0 @@
-from vcache.vcache_core.cache.embedding_store.vector_db.strategies.chroma import (
- ChromaVectorDB,
-)
-from vcache.vcache_core.cache.embedding_store.vector_db.strategies.faiss import (
- FAISSVectorDB,
-)
-from vcache.vcache_core.cache.embedding_store.vector_db.strategies.hnsw_lib import (
- HNSWLibVectorDB,
-)
-from vcache.vcache_core.cache.embedding_store.vector_db.vector_db import (
- SimilarityMetricType,
- VectorDB,
-)
-
-__all__ = [
- "VectorDB",
- "SimilarityMetricType",
- "HNSWLibVectorDB",
- "FAISSVectorDB",
- "ChromaVectorDB",
-]
diff --git a/vcache/vcache_core/cache/embedding_store/vector_db/strategies/chroma.py b/vcache/vcache_core/cache/embedding_store/vector_db/strategies/chroma.py
deleted file mode 100644
index f2cfe47..0000000
--- a/vcache/vcache_core/cache/embedding_store/vector_db/strategies/chroma.py
+++ /dev/null
@@ -1,140 +0,0 @@
-from typing import List
-
-import chromadb
-
-from vcache.vcache_core.cache.embedding_store.vector_db.vector_db import (
- SimilarityMetricType,
- VectorDB,
-)
-
-
-class ChromaVectorDB(VectorDB):
- """
- ChromaDB-based vector database implementation for efficient similarity search.
- """
-
- def __init__(
- self, similarity_metric_type: SimilarityMetricType = SimilarityMetricType.COSINE
- ):
- """
- Initialize ChromaDB vector database.
-
- Args:
- similarity_metric_type: The similarity metric to use for comparisons.
- """
- self.__next_embedding_id = 0
- self.collection = None
- self.client = None
- self.similarity_metric_type = similarity_metric_type
-
- def add(self, embedding: List[float]) -> int:
- """
- Add an embedding vector to the database.
-
- Args:
- embedding: The embedding vector to add.
-
- Returns:
- The unique ID assigned to the added embedding.
- """
- if self.collection is None:
- self._init_vector_store(len(embedding))
- id = self.__next_embedding_id
- self.collection.add(embeddings=[embedding], ids=[str(id)])
- self.__next_embedding_id += 1
- return id
-
- def remove(self, embedding_id: int) -> int:
- """
- Remove an embedding from the database.
-
- Args:
- embedding_id: The ID of the embedding to remove.
-
- Returns:
- The ID of the removed embedding.
-
- Raises:
- ValueError: If the collection is not initialized.
- """
- if self.collection is None:
- raise ValueError("Collection is not initialized")
- self.collection.delete(ids=[str(embedding_id)])
- return embedding_id
-
- def get_knn(self, embedding: List[float], k: int) -> List[tuple[float, int]]:
- """
- Get k-nearest neighbors for the given embedding.
-
- Args:
- embedding: The query embedding vector.
- k: The number of nearest neighbors to return.
-
- Returns:
- List of tuples containing similarity scores and embedding IDs.
-
- Raises:
- ValueError: If the collection is not initialized.
- """
- if self.collection is None:
- raise ValueError("Collection is not initialized")
- if self.collection.count() == 0:
- return []
- k_ = min(k, self.collection.count())
- results = self.collection.query(
- query_embeddings=[embedding], n_results=k_, include=["distances"]
- )
- distances = results.get("distances", [[]])[0]
- ids = results.get("ids", [[]])[0]
- return [
- (
- self.transform_similarity_score(
- float(dist), self.similarity_metric_type.value
- ),
- int(idx),
- )
- for dist, idx in zip(distances, ids)
- ]
-
- def reset(self) -> None:
- """
- Reset the vector database to empty state.
- """
- if self.collection is not None:
- self.collection.delete(ids=self.collection.get()["ids"])
- self.__next_embedding_id = 0
-
- def _init_vector_store(self, embedding_dim: int):
- """
- Initialize the ChromaDB collection with the given embedding dimension.
-
- Args:
- embedding_dim: The dimension of the embedding vectors.
-
- Raises:
- ValueError: If the similarity metric type is invalid.
- """
- self.client = chromadb.Client()
- collection_name = f"vcache_collection_{id(self)}"
- metric_type = self.similarity_metric_type.value
- match metric_type:
- case "cosine":
- space = "cosine"
- case "euclidean":
- space = "l2"
- case _:
- raise ValueError(f"Invalid similarity metric type: {metric_type}")
- self.collection = self.client.create_collection(
- name=collection_name,
- metadata={"dimension": embedding_dim, "hnsw:space": space},
- get_or_create=True,
- )
-
- def is_empty(self) -> bool:
- """
- Check if the vector database is empty.
-
- Returns:
- True if the database contains no embeddings, False otherwise.
- """
- return self.collection.count() == 0
diff --git a/vcache/vcache_core/cache/embedding_store/vector_db/strategies/faiss.py b/vcache/vcache_core/cache/embedding_store/vector_db/strategies/faiss.py
deleted file mode 100644
index ad6f883..0000000
--- a/vcache/vcache_core/cache/embedding_store/vector_db/strategies/faiss.py
+++ /dev/null
@@ -1,170 +0,0 @@
-from typing import List
-
-import faiss
-import numpy as np
-
-from vcache.vcache_core.cache.embedding_store.vector_db.vector_db import (
- SimilarityMetricType,
- VectorDB,
-)
-
-
-class FAISSVectorDB(VectorDB):
- """
- FAISS-based vector database implementation for efficient similarity search.
- """
-
- def __init__(
- self, similarity_metric_type: SimilarityMetricType = SimilarityMetricType.COSINE
- ):
- """
- Initialize FAISS vector database.
-
- Args:
- similarity_metric_type: The similarity metric to use for comparisons.
- """
- self.similarity_metric_type = similarity_metric_type
- self.__next_embedding_id = 0
- self.index = None
-
- def transform_similarity_score(
- self, similarity_score: float, metric_type: str
- ) -> float:
- """
- Transform similarity score based on the metric type.
-
- Args:
- similarity_score: The raw similarity score.
- metric_type: The type of similarity metric used.
-
- Returns:
- The transformed similarity score.
-
- Raises:
- ValueError: If the similarity metric type is invalid.
- """
- match metric_type:
- case "cosine":
- return similarity_score
- case "euclidean":
- return 1 - similarity_score
- case _:
- raise ValueError(f"Invalid similarity metric type: {metric_type}")
-
- def add(self, embedding: List[float]) -> int:
- """
- Add an embedding vector to the database.
-
- Args:
- embedding: The embedding vector to add.
-
- Returns:
- The unique ID assigned to the added embedding.
- """
- if self.index is None:
- self._init_vector_store(len(embedding))
- id = self.__next_embedding_id
- ids = np.array([id], dtype=np.int64)
- embedding_array = np.array([embedding], dtype=np.float32)
- metric_type = self.similarity_metric_type.value
- # Normalize the embedding vector if the metric type is cosine
- if metric_type == "cosine":
- faiss.normalize_L2(embedding_array)
- self.index.add_with_ids(embedding_array, ids)
- self.__next_embedding_id += 1
- return id
-
- def remove(self, embedding_id: int) -> int:
- """
- Remove an embedding from the database.
-
- Args:
- embedding_id: The ID of the embedding to remove.
-
- Returns:
- The ID of the removed embedding.
-
- Raises:
- ValueError: If the index is not initialized.
- """
- if self.index is None:
- raise ValueError("Index is not initialized")
- id_array = np.array([embedding_id], dtype=np.int64)
- self.index.remove_ids(
- faiss.IDSelectorBatch(id_array.size, faiss.swig_ptr(id_array))
- )
- return embedding_id
-
- def get_knn(self, embedding: List[float], k: int) -> List[tuple[float, int]]:
- """
- Get k-nearest neighbors for the given embedding.
-
- Args:
- embedding: The query embedding vector.
- k: The number of nearest neighbors to return.
-
- Returns:
- List of tuples containing similarity scores and embedding IDs.
-
- Raises:
- ValueError: If the index is not initialized.
- """
- if self.index is None:
- raise ValueError("Index is not initialized")
- if self.index.ntotal == 0:
- return []
- k_ = min(k, self.index.ntotal)
- query_vector = np.array([embedding], dtype=np.float32)
- metric_type = self.similarity_metric_type.value
- # Normalize the query vector if the metric type is cosine
- if metric_type == "cosine":
- faiss.normalize_L2(query_vector)
- distances, indices = self.index.search(query_vector, k_)
- # Filter out results where index is -1 (deleted embeddings)
- filtered_results = [
- (distances[0][i], indices[0][i])
- for i in range(len(indices[0]))
- if indices[0][i] != -1
- ]
- return [
- (self.transform_similarity_score(dist, metric_type), int(idx))
- for dist, idx in filtered_results
- ]
-
- def reset(self) -> None:
- """
- Reset the vector database to empty state.
- """
- if self.index is not None:
- dim = self.index.d
- self._init_vector_store(dim)
- self.__next_embedding_id = 0
-
- def _init_vector_store(self, embedding_dim: int):
- """
- Initialize the FAISS index with the given embedding dimension.
-
- Args:
- embedding_dim: The dimension of the embedding vectors.
-
- Raises:
- ValueError: If the similarity metric type is invalid.
- """
- metric_type = self.similarity_metric_type.value
- match metric_type:
- case "cosine":
- faiss_metric = faiss.METRIC_INNER_PRODUCT
- case "euclidean":
- faiss_metric = faiss.METRIC_L2
- case _:
- raise ValueError(f"Invalid similarity metric type: {metric_type}")
- self.index = faiss.index_factory(embedding_dim, "IDMap,Flat", faiss_metric)
-
- def is_empty(self) -> bool:
- """
- Check if the vector database is empty.
-
- Returns:
- True if the database contains no embeddings, False otherwise.
- """
- return self.index.ntotal == 0
diff --git a/vcache/vcache_core/cache/vector_db/__init__.py b/vcache/vcache_core/cache/vector_db/__init__.py
new file mode 100644
index 0000000..fc736f2
--- /dev/null
+++ b/vcache/vcache_core/cache/vector_db/__init__.py
@@ -0,0 +1,15 @@
+from vcache.vcache_core.cache.vector_db.embedding_metadata_obj import (
+ EmbeddingMetadataObj,
+)
+from vcache.vcache_core.cache.vector_db.strategies.hnsw_lib import HNSWLibVectorDB
+from vcache.vcache_core.cache.vector_db.vector_db import (
+ SimilarityMetricType,
+ VectorDB,
+)
+
+__all__ = [
+ "EmbeddingMetadataObj",
+ "VectorDB",
+ "HNSWLibVectorDB",
+ "SimilarityMetricType",
+]
diff --git a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/embedding_metadata_obj.py b/vcache/vcache_core/cache/vector_db/embedding_metadata_obj.py
similarity index 99%
rename from vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/embedding_metadata_obj.py
rename to vcache/vcache_core/cache/vector_db/embedding_metadata_obj.py
index 62914ad..2e45b82 100644
--- a/vcache/vcache_core/cache/embedding_store/embedding_metadata_storage/embedding_metadata_obj.py
+++ b/vcache/vcache_core/cache/vector_db/embedding_metadata_obj.py
@@ -11,8 +11,8 @@ class EmbeddingMetadataObj:
def __init__(
self,
- embedding_id: int,
response: str,
+ embedding_id: int = -1,
prior: np.ndarray = None,
posterior: np.ndarray = None,
region_reject: List[str] = None,
diff --git a/vcache/vcache_core/cache/embedding_store/vector_db/strategies/hnsw_lib.py b/vcache/vcache_core/cache/vector_db/strategies/hnsw_lib.py
similarity index 57%
rename from vcache/vcache_core/cache/embedding_store/vector_db/strategies/hnsw_lib.py
rename to vcache/vcache_core/cache/vector_db/strategies/hnsw_lib.py
index 7bb62c6..f641272 100644
--- a/vcache/vcache_core/cache/embedding_store/vector_db/strategies/hnsw_lib.py
+++ b/vcache/vcache_core/cache/vector_db/strategies/hnsw_lib.py
@@ -1,8 +1,11 @@
-from typing import List
+from typing import Dict, List
import hnswlib
-from vcache.vcache_core.cache.embedding_store.vector_db.vector_db import (
+from vcache.vcache_core.cache.vector_db.embedding_metadata_obj import (
+ EmbeddingMetadataObj,
+)
+from vcache.vcache_core.cache.vector_db.vector_db import (
SimilarityMetricType,
VectorDB,
)
@@ -14,7 +17,7 @@
class HNSWLibVectorDB(VectorDB):
"""
- HNSWLib-based vector database implementation for efficient similarity search.
+ HNSWLib-based vector database implementation that stores both embeddings and metadata.
"""
def __init__(
@@ -39,27 +42,36 @@ def __init__(
self.M = None
self.ef = None
self.index = None
+ self.metadata_storage: Dict[int, EmbeddingMetadataObj] = {}
- def add(self, embedding: List[float]) -> int:
+ def add(self, embedding: List[float], metadata: EmbeddingMetadataObj) -> int:
"""
- Add an embedding vector to the database.
+ Add an embedding vector and its metadata to the database.
Args:
embedding: The embedding vector to add.
+ metadata: The metadata object associated with the embedding.
Returns:
The unique ID assigned to the added embedding.
"""
if self.index is None:
self._init_vector_store(len(embedding))
- self.index.add_items(embedding, self.__next_embedding_id)
+
+ embedding_id = self.__next_embedding_id
+ self.index.add_items(embedding, embedding_id)
+
+ # Automatically set the embedding_id in the metadata
+ metadata.embedding_id = embedding_id
+ self.metadata_storage[embedding_id] = metadata
+
self.embedding_count += 1
self.__next_embedding_id += 1
- return self.__next_embedding_id - 1
+ return embedding_id
def remove(self, embedding_id: int) -> int:
"""
- Remove an embedding from the database.
+ Remove an embedding and its metadata from the database.
Args:
embedding_id: The ID of the embedding to remove.
@@ -68,11 +80,15 @@ def remove(self, embedding_id: int) -> int:
The ID of the removed embedding.
Raises:
- ValueError: If the index is not initialized.
+ ValueError: If the index is not initialized or embedding not found.
"""
if self.index is None:
raise ValueError("Index is not initialized")
+ if embedding_id not in self.metadata_storage:
+ raise ValueError(f"Embedding with ID {embedding_id} not found")
+
self.index.mark_deleted(embedding_id)
+ del self.metadata_storage[embedding_id]
self.embedding_count -= 1
return embedding_id
@@ -98,7 +114,57 @@ def get_knn(self, embedding: List[float], k: int) -> List[tuple[float, int]]:
self.transform_similarity_score(sim, metric_type) for sim in similarities[0]
]
id_list = [int(id) for id in ids[0]]
- return list(zip(similarity_scores, id_list))
+
+ # Filter out deleted embeddings (those not in metadata_storage)
+ results = []
+ for score, embedding_id in zip(similarity_scores, id_list):
+ if embedding_id in self.metadata_storage:
+ results.append((score, embedding_id))
+
+ return results
+
+ def get_metadata(self, embedding_id: int) -> EmbeddingMetadataObj:
+ """
+ Get metadata for a specific embedding.
+
+ Args:
+ embedding_id: The ID of the embedding to get metadata for.
+
+ Returns:
+ The metadata object for the embedding.
+ """
+ if embedding_id not in self.metadata_storage:
+ raise ValueError(f"Metadata for embedding ID {embedding_id} not found")
+ return self.metadata_storage[embedding_id]
+
+ def update_metadata(
+ self, embedding_id: int, metadata: EmbeddingMetadataObj
+ ) -> EmbeddingMetadataObj:
+ """
+ Update metadata for a specific embedding.
+
+ Args:
+ embedding_id: The ID of the embedding to update metadata for.
+ metadata: The new metadata object.
+
+ Returns:
+ The updated metadata object.
+ """
+ if embedding_id not in self.metadata_storage:
+ raise ValueError(f"Metadata for embedding ID {embedding_id} not found")
+
+ self.metadata_storage[embedding_id] = metadata
+ self.metadata_storage[embedding_id].embedding_id = embedding_id
+ return metadata
+
+ def get_all_embedding_metadata_objects(self) -> List[EmbeddingMetadataObj]:
+ """
+ Get all embedding metadata objects in the database.
+
+ Returns:
+ A list of all embedding metadata objects.
+ """
+ return list(self.metadata_storage.values())
def reset(self) -> None:
"""
@@ -109,6 +175,7 @@ def reset(self) -> None:
self._init_vector_store(self.dim)
self.embedding_count = 0
self.__next_embedding_id = 0
+ self.metadata_storage.clear()
def _init_vector_store(self, embedding_dim: int):
"""
diff --git a/vcache/vcache_core/cache/embedding_store/vector_db/vector_db.py b/vcache/vcache_core/cache/vector_db/vector_db.py
similarity index 61%
rename from vcache/vcache_core/cache/embedding_store/vector_db/vector_db.py
rename to vcache/vcache_core/cache/vector_db/vector_db.py
index f28f8e2..6be68b4 100644
--- a/vcache/vcache_core/cache/embedding_store/vector_db/vector_db.py
+++ b/vcache/vcache_core/cache/vector_db/vector_db.py
@@ -2,6 +2,10 @@
from enum import Enum
from typing import List
+from vcache.vcache_core.cache.vector_db.embedding_metadata_obj import (
+ EmbeddingMetadataObj,
+)
+
class SimilarityMetricType(Enum):
"""
@@ -14,7 +18,7 @@ class SimilarityMetricType(Enum):
class VectorDB(ABC):
"""
- Abstract base class for vector databases.
+ Abstract base class for vector databases that store both embeddings and metadata.
"""
def transform_similarity_score(
@@ -39,12 +43,13 @@ def transform_similarity_score(
raise ValueError(f"Invalid similarity metric type: {metric_type}")
@abstractmethod
- def add(self, embedding: List[float]) -> int:
+ def add(self, embedding: List[float], metadata: EmbeddingMetadataObj) -> int:
"""
- Add an embedding to the vector database.
+ Add an embedding and its metadata to the vector database.
Args:
embedding: The embedding to add to the vector db.
+ metadata: The metadata object associated with the embedding.
Returns:
The id of the embedding.
@@ -54,7 +59,7 @@ def add(self, embedding: List[float]) -> int:
@abstractmethod
def remove(self, embedding_id: int) -> int:
"""
- Remove an embedding from the vector database.
+ Remove an embedding and its metadata from the vector database.
Args:
embedding_id: The id of the embedding to remove.
@@ -78,6 +83,45 @@ def get_knn(self, embedding: List[float], k: int) -> List[tuple[float, int]]:
"""
pass
+ @abstractmethod
+ def get_metadata(self, embedding_id: int) -> EmbeddingMetadataObj:
+ """
+ Get metadata for a specific embedding.
+
+ Args:
+ embedding_id: The id of the embedding to get the metadata for.
+
+ Returns:
+ The metadata of the embedding.
+ """
+ pass
+
+ @abstractmethod
+ def update_metadata(
+ self, embedding_id: int, metadata: EmbeddingMetadataObj
+ ) -> EmbeddingMetadataObj:
+ """
+ Update metadata for a specific embedding.
+
+ Args:
+ embedding_id: The id of the embedding to update the metadata for.
+ metadata: The metadata to update the embedding with.
+
+ Returns:
+ The updated metadata of the embedding.
+ """
+ pass
+
+ @abstractmethod
+ def get_all_embedding_metadata_objects(self) -> List[EmbeddingMetadataObj]:
+ """
+ Get all embedding metadata objects in the database.
+
+ Returns:
+ A list of all the embedding metadata objects in the database.
+ """
+ pass
+
@abstractmethod
def reset(self) -> None:
"""
diff --git a/vcache/vcache_policy/strategies/benchmark_iid_verified.py b/vcache/vcache_policy/strategies/benchmark_iid_verified.py
index 69827c1..9924932 100644
--- a/vcache/vcache_policy/strategies/benchmark_iid_verified.py
+++ b/vcache/vcache_policy/strategies/benchmark_iid_verified.py
@@ -7,10 +7,9 @@
from vcache.config import VCacheConfig
from vcache.vcache_core.cache.cache import Cache
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import (
+from vcache.vcache_core.cache.vector_db.embedding_metadata_obj import (
EmbeddingMetadataObj,
)
-from vcache.vcache_core.cache.embedding_store.embedding_store import EmbeddingStore
from vcache.vcache_core.similarity_evaluator import (
SimilarityEvaluator,
StringComparisonSimilarityEvaluator,
@@ -53,10 +52,7 @@ def setup(self, config: VCacheConfig):
self.inference_engine = config.inference_engine
self.cache = Cache(
embedding_engine=config.embedding_engine,
- embedding_store=EmbeddingStore(
- embedding_metadata_storage=config.embedding_metadata_storage,
- vector_db=config.vector_db,
- ),
+ vector_db=config.vector_db,
eviction_policy=config.eviction_policy,
)
diff --git a/vcache/vcache_policy/strategies/benchmark_static.py b/vcache/vcache_policy/strategies/benchmark_static.py
index b192586..83e9876 100644
--- a/vcache/vcache_policy/strategies/benchmark_static.py
+++ b/vcache/vcache_policy/strategies/benchmark_static.py
@@ -4,7 +4,6 @@
from vcache.config import VCacheConfig
from vcache.vcache_core.cache.cache import Cache
-from vcache.vcache_core.cache.embedding_store.embedding_store import EmbeddingStore
from vcache.vcache_policy.vcache_policy import VCachePolicy
@@ -40,10 +39,7 @@ def setup(self, config: VCacheConfig):
self.inference_engine = config.inference_engine
self.cache = Cache(
embedding_engine=config.embedding_engine,
- embedding_store=EmbeddingStore(
- embedding_metadata_storage=config.embedding_metadata_storage,
- vector_db=config.vector_db,
- ),
+ vector_db=config.vector_db,
eviction_policy=config.eviction_policy,
)
diff --git a/vcache/vcache_policy/strategies/benchmark_verified_global.py b/vcache/vcache_policy/strategies/benchmark_verified_global.py
index df26f29..7d985ae 100644
--- a/vcache/vcache_policy/strategies/benchmark_verified_global.py
+++ b/vcache/vcache_policy/strategies/benchmark_verified_global.py
@@ -12,10 +12,9 @@
from vcache.config import VCacheConfig
from vcache.inference_engine import InferenceEngine
from vcache.vcache_core.cache.cache import Cache
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import (
+from vcache.vcache_core.cache.vector_db.embedding_metadata_obj import (
EmbeddingMetadataObj,
)
-from vcache.vcache_core.cache.embedding_store.embedding_store import EmbeddingStore
from vcache.vcache_core.similarity_evaluator import SimilarityEvaluator
from vcache.vcache_policy.vcache_policy import VCachePolicy
@@ -54,10 +53,7 @@ def setup(self, config: VCacheConfig):
self.inference_engine = config.inference_engine
self.cache = Cache(
embedding_engine=config.embedding_engine,
- embedding_store=EmbeddingStore(
- embedding_metadata_storage=config.embedding_metadata_storage,
- vector_db=config.vector_db,
- ),
+ vector_db=config.vector_db,
eviction_policy=config.eviction_policy,
)
diff --git a/vcache/vcache_policy/strategies/verified.py b/vcache/vcache_policy/strategies/verified.py
index ffd93e0..ecf540f 100644
--- a/vcache/vcache_policy/strategies/verified.py
+++ b/vcache/vcache_policy/strategies/verified.py
@@ -16,13 +16,10 @@
from vcache.config import VCacheConfig
from vcache.inference_engine import InferenceEngine
from vcache.vcache_core.cache.cache import Cache
-from vcache.vcache_core.cache.embedding_store.embedding_metadata_storage.embedding_metadata_obj import (
+from vcache.vcache_core.cache.vector_db.embedding_metadata_obj import (
EmbeddingMetadataObj,
)
-from vcache.vcache_core.cache.embedding_store.embedding_store import EmbeddingStore
-from vcache.vcache_core.similarity_evaluator import (
- SimilarityEvaluator,
-)
+from vcache.vcache_core.similarity_evaluator import SimilarityEvaluator
from vcache.vcache_policy.vcache_policy import VCachePolicy
# Disable Hugging Face tokenizer parallelism to prevent deadlocks when using
@@ -144,10 +141,7 @@ def setup(self, config: VCacheConfig):
self.inference_engine = config.inference_engine
self.cache = Cache(
embedding_engine=config.embedding_engine,
- embedding_store=EmbeddingStore(
- embedding_metadata_storage=config.embedding_metadata_storage,
- vector_db=config.vector_db,
- ),
+ vector_db=config.vector_db,
eviction_policy=config.eviction_policy,
)