English | 繁體中文 | 日本語 | 한국어 | Español | Français | Deutsch
Hey there! Welcome to EmbedDB! This is a super cool vector-based tag system written in TypeScript. It makes similarity searching as easy as having an AI assistant helping you find stuff!
- Powerful vector-based similarity search
- Weighted tags with confidence scores (You say it's important? It's important!)
- Category weights for fine-tuned search (Control which categories matter more!)
- Batch operations (Handle lots of data at once, super efficient!)
- Built-in query caching (Repeated queries? Lightning fast!)
- Full TypeScript support (Type-safe, developer-friendly!)
- Memory-efficient sparse vector implementation (Your RAM will thank you!)
- Import/Export functionality (Save and restore your indexes!)
- Pagination support with filter-first approach (Get filtered results in chunks!)
- Advanced filtering system (Filter first, sort by similarity!)
First, install the package:
npm install embeddb
Let's see it in action:
import { TagVectorSystem, Tag, IndexTag } from 'embeddb';
// Create a new system
const system = new TagVectorSystem();
// Define our tag universe
const tags: IndexTag[] = [
{ category: 'color', value: 'red' }, // Red is rad!
{ category: 'color', value: 'blue' }, // Blue is cool!
{ category: 'size', value: 'large' } // Size matters!
];
// Build the tag index (important step!)
system.buildIndex(tags);
// Add an item with its tags and confidence scores
const item = {
id: 'cool-item-1',
tags: [
{ category: 'color', value: 'red', confidence: 1.0 }, // 100% sure it's red!
{ category: 'size', value: 'large', confidence: 0.8 } // Pretty sure it's large
]
};
system.addItem(item);
// Set category weights to prioritize color matches
system.setCategoryWeight('color', 2.0); // Color matches are twice as important
// Let's find similar items
const query = {
tags: [
{ category: 'color', value: 'red', confidence: 0.9 }
]
};
// Query with pagination
const results = system.query(query.tags, { page: 1, size: 10 }); // Get first 10 results
// Export the index for later use
const exportedData = system.exportIndex();
// Import the index in another instance
const newSystem = new TagVectorSystem();
newSystem.importIndex(exportedData);
This is our superhero! It handles all the operations.
-
buildIndex(tags: IndexTag[])
: Build your tag universe// Define your tag world! system.buildIndex([ { category: 'color', value: 'red' }, { category: 'style', value: 'modern' } ]);
-
addItem(item: ItemTags)
: Add a single item// Add something awesome system.addItem({ id: 'awesome-item', tags: [ { category: 'color', value: 'red', confidence: 1.0 } ] });
-
addItemBatch(items: ItemTags[], batchSize?: number)
: Batch add items// Add multiple items at once for better performance! system.addItemBatch([item1, item2, item3], 10);
-
query(tags: Tag[], options?: QueryOptions)
: Search for similar items// Find similar stuff const results = system.query([ { category: 'style', value: 'modern', confidence: 0.9 } ], { page: 1, size: 20 });
-
queryFirst(tags: Tag[])
: Get the most similar item// Just get the best match const bestMatch = system.queryFirst([ { category: 'color', value: 'red', confidence: 1.0 } ]);
-
getStats()
: Get system statistics// Check out the system stats const stats = system.getStats(); console.log(`Total items: ${stats.totalItems}`);
-
exportIndex()
&importIndex()
: Export/Import index data// Save your data for later const data = system.exportIndex(); // ... later ... system.importIndex(data);
-
setCategoryWeight(category: string, weight: number)
: Set category weight// Make color matches twice as important system.setCategoryWeight('color', 2.0);
Want to contribute? Awesome! Here are some handy commands:
# Install dependencies
npm install
# Build the project
npm run build
# Run tests (we love testing!)
npm test
# Check code style
npm run lint
# Make the code pretty
npm run format
EmbedDB uses vector magic to make similarity search possible:
-
Tag Indexing:
- Each category-value pair gets mapped to a unique vector position
- This lets us transform tags into numerical vectors
-
Vector Transformation:
- Item tags are converted into sparse vectors
- Confidence scores are used as vector weights
-
Similarity Calculation:
- Uses cosine similarity to measure vector relationships
- This helps us find the most similar items
-
Performance Optimizations:
- Sparse vectors for memory efficiency
- Query caching for speed
- Batch operations for better throughput
Under the hood, EmbedDB uses several clever techniques:
-
Sparse Vector Implementation
- Only stores non-zero values
- Reduces memory footprint
- Perfect for tag-based systems where most values are zero
-
Cosine Similarity
- Measures angle between vectors
- Range: -1 to 1 (we normalize to 0 to 1)
- Used only for sorting, not filtering
- Ideal for high-dimensional sparse spaces
-
Filter-First Architecture
- Filters are applied before similarity calculation
- Results quantity determined by filters only
- Similarity scores used purely for sorting
- Efficient for large datasets
-
Category Weight Management
- Fine-grained control over category importance
- Individual and batch weight updates
- Default weights for unknown categories
- Automatic cache invalidation on weight changes
MIT License - Go wild, build awesome stuff!
Got questions or suggestions? We'd love to hear from you:
- Open an Issue
- Submit a PR
Let's make EmbedDB even more awesome!
If you find EmbedDB useful, give us a star! It helps others discover this project and motivates us to keep improving it!