Semantic Search with Couchbase Vector Store and LLM Integration

This repository demonstrates how to build a powerful semantic search engine using Couchbase as the backend database, combined with various AI-powered embedding and language model providers such as OpenAI, Azure OpenAI, Anthropic (Claude), Cohere, Hugging Face, Jina AI, Mistral AI, and Voyage AI.

Each example provides two distinct approaches:

FTS (Full Text Search): Uses Couchbase's vector search capabilities with pre-created search indices
GSI (Global Secondary Index): Leverages Couchbase's native SQL++ queries with vector similarity functions

Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it essential for applications that require intelligent information retrieval.

Features

Multiple Embedding Models: Support for embeddings from OpenAI, Azure OpenAI, Anthropic (Claude), Cohere, Hugging Face, Jina AI, Mistral AI, and Voyage AI.
Couchbase Vector Store: Utilizes Couchbase's vector storage capabilities for efficient similarity search.
Retrieval-Augmented Generation (RAG): Integrates with advanced language models like GPT-4 for generating contextually relevant responses.
Scalable and Flexible: Easy to switch between different embedding models and adjust the index structure accordingly.
Caching Mechanism: Implements CouchbaseCache for improved performance on repeated queries.

Prerequisites

Python 3.8+
Couchbase Cluster (Self Managed or Capella) version 7.6+ with Search Service
API keys for the respective AI providers (e.g., OpenAI, Azure OpenAI, Anthropic, Cohere, etc.)

Setup

1. Clone the repository:

git clone https://github.com/your-username/vector-search-cookbook.git
cd vector-search-cookbook

2. Choose Your Approach:

For FTS (Full Text Search) Examples:

Use the provided {model}_index.json index definition file in each model's fts/ directory to create a new vector search index in your Couchbase cluster.

For GSI (Global Secondary Index) Examples:

No additional setup required. GSI index will be created in each model's example.

3. Run the notebook file

You can either run the notebook file on Google Colab or run it on your system by setting up the Python environment.

Components

1. Multiple Embedding Models

The system supports embeddings from various AI providers:

OpenAI
Azure OpenAI
Anthropic (Claude)
Cohere
Hugging Face
Jina AI
Mistral AI
Voyage AI

2. Couchbase Vector Store

Couchbase is used to store document embeddings and metadata. The index structure allows for efficient retrieval across different embedding types.

3. Retrieval-Augmented Generation (RAG)

The RAG pipeline integrates with language models like GPT-4 to generate contextually relevant answers based on retrieved documents.

4. Semantic Search

Each notebook implements a semantic search function that performs similarity searche using the appropriate embedding type and retrieves the top-k most similar documents.

5. Caching

The system implements caching functionality using CouchbaseCache to improve performance for repeated queries.

Couchbase Vector Search Index (FTS Approach Only)

For FTS examples, you'll need to create a vector search index using the provided JSON configuration files. For more information on creating a vector search index, please follow the instructions. The following is an example for Azure OpenAI Model.

{
    "type": "fulltext-index",
    "name": "vector_search_azure",
    "uuid": "",
    "sourceType": "gocbcore",
    "sourceName": "vector-search-testing",
    "planParams": {
      "maxPartitionsPerPIndex": 64,
      "indexPartitions": 16
    },
    "params": {
      "doc_config": {
        "docid_prefix_delim": "",
        "docid_regexp": "",
        "mode": "scope.collection.type_field",
        "type_field": "type"
      },
      "mapping": {
        "analysis": {},
        "default_analyzer": "standard",
        "default_datetime_parser": "dateTimeOptional",
        "default_field": "_all",
        "default_mapping": {
          "dynamic": true,
          "enabled": false
        },
        "default_type": "_default",
        "docvalues_dynamic": false,
        "index_dynamic": true,
        "store_dynamic": false,
        "type_field": "_type",
        "types": {
          "shared.azure": {
            "dynamic": true,
            "enabled": true,
            "properties": {
              "embedding": {
                "dynamic": false,
                "enabled": true,
                "fields": [
                  {
                    "dims": 1536,
                    "index": true,
                    "name": "embedding",
                    "similarity": "dot_product",
                    "type": "vector",
                    "vector_index_optimized_for": "recall"
                  }
                ]
              },
              "text": {
                "dynamic": false,
                "enabled": true,
                "fields": [
                  {
                    "index": true,
                    "name": "text",
                    "store": true,
                    "type": "text"
                  }
                ]
              }
            }
          }
        }
      },
      "store": {
        "indexType": "scorch",
        "segmentVersion": 16
      }
    },
    "sourceParams": {}
  }

Name		Name	Last commit message	Last commit date
Latest commit History 344 Commits
.github/workflows		.github/workflows
ag2		ag2
awsbedrock-agents		awsbedrock-agents
awsbedrock		awsbedrock
azure		azure
capella-ai		capella-ai
claudeai		claudeai
cohere		cohere
crewai-short-term-memory		crewai-short-term-memory
crewai		crewai
huggingface		huggingface
jinaai		jinaai
langgraph		langgraph
memGpt_letta		memGpt_letta
mistralai		mistralai
openrouter-deepseek		openrouter-deepseek
pydantic_ai		pydantic_ai
smolagents		smolagents
util/convert		util/convert
voyage		voyage
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Semantic Search with Couchbase Vector Store and LLM Integration

Features

Prerequisites

Setup

1. Clone the repository:

2. Choose Your Approach:

For FTS (Full Text Search) Examples:

For GSI (Global Secondary Index) Examples:

3. Run the notebook file

Components

1. Multiple Embedding Models

2. Couchbase Vector Store

3. Retrieval-Augmented Generation (RAG)

4. Semantic Search

5. Caching

Couchbase Vector Search Index (FTS Approach Only)

About

Uh oh!

Releases

Packages

Languages

License

couchbaselabs/vector-search-cookbook

Folders and files

Latest commit

History

Repository files navigation

Semantic Search with Couchbase Vector Store and LLM Integration

Features

Prerequisites

Setup

1. Clone the repository:

2. Choose Your Approach:

For FTS (Full Text Search) Examples:

For GSI (Global Secondary Index) Examples:

3. Run the notebook file

Components

1. Multiple Embedding Models

2. Couchbase Vector Store

3. Retrieval-Augmented Generation (RAG)

4. Semantic Search

5. Caching

Couchbase Vector Search Index (FTS Approach Only)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages