Skip to content

community[minor]: Create CouchbaseQueryVectorStore #8333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
387 changes: 387 additions & 0 deletions docs/core_docs/docs/integrations/vectorstores/couchbase_query.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,387 @@
---
hide_table_of_contents: true
sidebar_class_name: node-only
---

import CodeBlock from "@theme/CodeBlock";

# Couchbase Query Vector Store

[Couchbase](http://couchbase.com/) is an award-winning distributed NoSQL cloud database that delivers unmatched versatility, performance, scalability, and financial value for all of your cloud, mobile, AI, and edge computing applications.

The `CouchbaseQueryVectorStore` is an implementation that uses Couchbase's Query service (SQL++) for vector similarity search instead of the Search service. This provides an alternative approach for vector operations using SQL++ queries with vector functions.

## Key Differences from CouchbaseVectorStore

- **Query Service**: Uses Couchbase's Query service with SQL++ instead of the Search service
- **No Index Required**: Does not require a pre-configured search index for basic operations
- **SQL++ Syntax**: Supports WHERE clauses and SQL++ query syntax for filtering
- **Vector Functions**: Uses `APPROX_VECTOR_DISTANCE` function for similarity calculations
- **Distance Strategies**: Supports multiple distance strategies (Euclidean, Cosine, Dot Product)

## Installation

```bash npm2yarn
npm install couchbase @langchain/openai @langchain/community @langchain/core
```

## Create Couchbase Connection Object

We create a connection to the Couchbase cluster initially and then pass the cluster object to the Vector Store. Here, we are connecting using the username and password.
You can also connect using any other supported way to your cluster.

For more information on connecting to the Couchbase cluster, please check the [Node SDK documentation](https://docs.couchbase.com/nodejs-sdk/current/hello-world/start-using-sdk.html#connect).

```typescript
import { Cluster } from "couchbase";

const connectionString = "couchbase://localhost"; // or couchbases://localhost if you are using TLS
const dbUsername = "Administrator"; // valid database user with read access to the bucket being queried
const dbPassword = "Password"; // password for the database user

const couchbaseClient = await Cluster.connect(connectionString, {
username: dbUsername,
password: dbPassword,
configProfile: "wanDevelopment",
});
```

## Basic Setup

```typescript
import { CouchbaseQueryVectorStore, DistanceStrategy } from "@langchain/community/vectorstores/couchbase_query";
import { OpenAIEmbeddings } from "@langchain/openai";
import { Cluster } from "couchbase";

// Connect to Couchbase
const cluster = await Cluster.connect("couchbase://localhost", {
username: "Administrator",
password: "password",
});

// Initialize embeddings
const embeddings = new OpenAIEmbeddings();

// Configure the vector store
const vectorStore = await CouchbaseQueryVectorStore.initialize(embeddings, {
cluster,
bucketName: "my-bucket",
scopeName: "my-scope",
collectionName: "my-collection",
textKey: "text", // optional, defaults to "text"
embeddingKey: "embedding", // optional, defaults to "embedding"
distanceStrategy: DistanceStrategy.COSINE, // optional, defaults to DOT
});
```

## Creating Vector Indexes

The Query vector store supports creating vector indexes to improve search performance. There are two types of indexes available:

### BHIVE Index
A specialized vector index optimized for vector operations using Couchbase's vector indexing capabilities:

```typescript
import { IndexType } from "@langchain/community/vectorstores/couchbase_query";

await vectorStore.createIndex({
indexType: IndexType.BHIVE,
indexDescription: "IVF,SQ8",
indexName: "my_vector_index", // optional
vectorDimension: 1536, // optional, auto-detected from embeddings
distanceMetric: DistanceStrategy.COSINE, // optional, uses store default
fields: ["text", "metadata"], // optional, defaults to text field
whereClause: "type = 'document'", // optional filter
indexScanNprobes: 10, // optional tuning parameter
indexTrainlist: 1000, // optional tuning parameter
});
```

**Generated SQL++:**
```sql
CREATE VECTOR INDEX `my_vector_index` ON `bucket`.`scope`.`collection`
(`embedding` VECTOR) INCLUDE (`text`, `metadata`)
WHERE type = 'document' USING GSI WITH {'dimension': 1536, 'similarity': 'cosine', 'description': 'IVF,SQ8'}
```

### Composite Index
A general-purpose GSI index that includes vector fields alongside scalar fields:

```typescript
await vectorStore.createIndex({
indexType: IndexType.COMPOSITE,
indexDescription: "IVF1024,SQ8",
indexName: "my_composite_index",
vectorDimension: 1536,
fields: ["text", "metadata.category"],
whereClause: "created_date > '2023-01-01'",
indexScanNprobes: 3,
indexTrainlist: 10000,
});
```

**Generated SQL++:**
```sql
CREATE INDEX `my_composite_index` ON `bucket`.`scope`.`collection`
(`text`, `metadata.category`, `embedding` VECTOR)
WHERE created_date > '2023-01-01' USING GSI
WITH {'dimension': 1536, 'similarity': 'dot', 'description': 'IVF1024,SQ8', 'scan_nprobes': 3, 'trainlist': 10000}
```

### Key Differences

| Aspect | BHIVE Index | COMPOSITE Index |
|--------|-------------|-----------------|
| **SQL++ Syntax** | `CREATE VECTOR INDEX` | `CREATE INDEX` |
| **Vector Field** | `(field VECTOR)` with `INCLUDE` clause | `(field1, field2, vector_field VECTOR)` |
| **Vector Parameters** | Supports all vector parameters | Supports all vector parameters |
| **Optimization** | Specialized for vector operations | General-purpose GSI with vector support |
| **Use Case** | Pure vector similarity search | Mixed vector and scalar queries |
| **Performance** | Optimized for vector distance calculations | Good for hybrid queries |
| **Tuning Parameters** | Supports `indexScanNprobes`, `indexTrainlist` | Supports `indexScanNprobes`, `indexTrainlist` |
| **Limitations** | Only one vector field, uses INCLUDE for other fields | One vector field among multiple index keys |

## Basic Vector Search Example

The following example showcases how to use Couchbase Query vector search and perform similarity search.

```typescript
import { OpenAIEmbeddings } from "@langchain/openai";
import {
CouchbaseQueryVectorStore,
DistanceStrategy,
} from "@langchain/community/vectorstores/couchbase_query";
import { Cluster } from "couchbase";
import { Document } from "@langchain/core/documents";

const connectionString = process.env.COUCHBASE_DB_CONN_STR ?? "couchbase://localhost";
const databaseUsername = process.env.COUCHBASE_DB_USERNAME ?? "Administrator";
const databasePassword = process.env.COUCHBASE_DB_PASSWORD ?? "Password";

const couchbaseClient = await Cluster.connect(connectionString, {
username: databaseUsername,
password: databasePassword,
configProfile: "wanDevelopment",
});

// OpenAI API Key is required to use OpenAIEmbeddings
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
});

const vectorStore = await CouchbaseQueryVectorStore.initialize(embeddings, {
cluster: couchbaseClient,
bucketName: "testing",
scopeName: "_default",
collectionName: "_default",
textKey: "text",
embeddingKey: "embedding",
distanceStrategy: DistanceStrategy.COSINE,
});

// Add documents
const documents = [
new Document({
pageContent: "Couchbase is a NoSQL database",
metadata: { category: "database", type: "document" }
}),
new Document({
pageContent: "Vector search enables semantic similarity",
metadata: { category: "ai", type: "document" }
})
];

await vectorStore.addDocuments(documents);

// Perform similarity search
const query = "What is a NoSQL database?";
const results = await vectorStore.similaritySearch(query, 4);
console.log("Search results:", results[0]);

// Search with scores
const resultsWithScores = await vectorStore.similaritySearchWithScore(query, 4);
console.log("Document:", resultsWithScores[0][0]);
console.log("Score:", resultsWithScores[0][1]);
```

## Searching Documents

### Basic Similarity Search

```typescript
// Basic similarity search
const results = await vectorStore.similaritySearch(
"What is a NoSQL database?",
4
);
```

### Search with Filters

```typescript
// Search with filters
const filteredResults = await vectorStore.similaritySearch(
"database technology",
4,
{
where: "metadata.category = 'database'",
fields: ["text", "metadata.category"]
}
);
```

### Search with Scores

```typescript
// Search with scores
const resultsWithScores = await vectorStore.similaritySearchWithScore(
"vector search capabilities",
4
);
```

### Complex Filtering

```typescript
const results = await vectorStore.similaritySearch(
"search query",
10,
{
where: "metadata.category IN ['tech', 'science'] AND metadata.rating >= 4",
fields: ["content", "metadata.title", "metadata.rating"]
}
);
```

## Configuration Options

### Distance Strategies

- `DistanceStrategy.DOT` - Dot product (default)
- `DistanceStrategy.L2` - L2 (Euclidean) distance
- `DistanceStrategy.EUCLIDEAN` - Euclidean distance
- `DistanceStrategy.COSINE` - Cosine distance
- `DistanceStrategy.L2_SQUARED` - Squared L2 distance
- `DistanceStrategy.EUCLIDEAN_SQUARED` - Squared Euclidean distance

### Index Types

- `IndexType.BHIVE` - Specialized vector index for optimal vector search performance
- `IndexType.COMPOSITE` - General-purpose index that can include vector and scalar fields

## Advanced Usage

### Custom Vector Fields

```typescript
const vectorStore = await CouchbaseQueryVectorStore.initialize(embeddings, {
cluster,
bucketName: "my-bucket",
scopeName: "my-scope",
collectionName: "my-collection",
textKey: "content",
embeddingKey: "vector_embedding",
distanceStrategy: DistanceStrategy.L2,
});
```

### Creating from Texts

```typescript
const texts = [
"Couchbase is a NoSQL database",
"Vector search enables semantic similarity"
];

const metadatas = [
{ category: "database" },
{ category: "ai" }
];

const vectorStore = await CouchbaseQueryVectorStore.fromTexts(
texts,
metadatas,
embeddings,
{
cluster,
bucketName: "my-bucket",
scopeName: "my-scope",
collectionName: "my-collection"
}
);
```

### Deleting Documents

```typescript
const documentIds = ["doc1", "doc2", "doc3"];
await vectorStore.delete({ ids: documentIds });
```

## Performance Considerations

1. **Create Indexes**: Use `createIndex()` to create appropriate vector indexes for better performance
2. **Choose Index Type**:
- Use **BHIVE indexes** for pure vector search workloads where you primarily perform similarity searches
- Use **COMPOSITE indexes** for mixed queries that combine vector similarity with scalar field filtering
3. **Tune Parameters**: Adjust `indexScanNprobes` and `indexTrainlist` based on your data size and performance requirements
4. **Filter Early**: Use WHERE clauses to reduce the search space before vector calculations
5. **Index Strategy**:
- **BHIVE**: Better for high-performance vector similarity search with minimal scalar filtering
- **COMPOSITE**: Better when you frequently filter by both vector similarity and scalar fields in the same query

## Error Handling

```typescript
try {
await vectorStore.createIndex({
indexType: IndexType.BHIVE,
indexDescription: "IVF,SQ8",
});
} catch (error) {
console.error("Index creation failed:", error.message);
}
```

### Common Errors

#### Insufficient Training Data
If you see errors related to insufficient training data, you may need to:
- Increase the `indexTrainlist` parameter (default recommendation: ~50 vectors per centroid)
- Ensure you have enough documents with vector embeddings in your collection
- For collections with < 1 million vectors, use `number_of_vectors / 1000` for centroids
- For larger collections, use `sqrt(number_of_vectors)` for centroids

## Comparison with CouchbaseVectorStore

| Feature | CouchbaseQueryVectorStore | CouchbaseVectorStore |
|---------|---------------------------|----------------------|
| Service | Query (SQL++) | Search (FTS) |
| Index Required | Optional (for performance) | Required |
| Query Language | SQL++ WHERE clauses | Search query syntax |
| Vector Functions | APPROX_VECTOR_DISTANCE | VectorQuery API |
| Setup Complexity | Lower | Higher |
| Performance | Good with indexes | Optimized for search |

<br />
<br />

# Frequently Asked Questions

## Question: Do I need to create an index before using CouchbaseQueryVectorStore?

No, unlike the Search-based CouchbaseVectorStore, the Query-based implementation can work without pre-created indexes. However, creating appropriate vector indexes (BHIVE or COMPOSITE) will significantly improve query performance.

## Question: When should I use BHIVE vs COMPOSITE indexes?

- Use **BHIVE indexes** when you primarily perform vector similarity searches with minimal filtering on other fields
- Use **COMPOSITE indexes** when you frequently combine vector similarity with filtering on scalar fields in the same query

## Question: Can I use both CouchbaseVectorStore and CouchbaseQueryVectorStore on the same data?

Yes, both can work on the same document structure. However, they use different services (Search vs Query) and have different indexing requirements.

## Related

- Vector store [conceptual guide](/docs/concepts/#vectorstores)
- Vector store [how-to guides](/docs/how_to/#vectorstores)
Loading