Skip to content

Commit

Permalink
Merge pull request #957 from seesharprun/cosmos-ai-freshness-3
Browse files Browse the repository at this point in the history
Cosmos DB | Refresh AI articles (#3)
  • Loading branch information
AnnaMHuff authored Dec 3, 2024
2 parents e68e196 + d7c4d0a commit e253c01
Show file tree
Hide file tree
Showing 8 changed files with 275 additions and 266 deletions.
77 changes: 40 additions & 37 deletions articles/cosmos-db/nosql/how-to-dotnet-vector-index-query.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,27 +7,29 @@ ms.author: jacodel
ms.service: azure-cosmos-db
ms.subservice: nosql
ms.topic: how-to
ms.date: 08/01/2023
ms.date: 12/03/2024
ms.custom: query-reference, devx-track-dotnet, build-2024, ignite-2024
ms.collection:
- ce-skilling-ai-copilot
appliesto:
- ✅ NoSQL
---

# Index and query vectors in Azure Cosmos DB for NoSQL in .NET

[!INCLUDE[NoSQL](../includes/appliesto-nosql.md)]
Before you use vector indexing and search, you must first enable the feature. This article covers the following steps:

1. Enabling the Vector Search in Azure Cosmos DB for NoSQL feature.
2. Setting up the Azure Cosmos DB container for vector search
3. Authoring vector embedding policy
4. Adding vector indexes to the container indexing policy
5. Creating a container with vector indexes and vector embedding policy
6. Performing a vector search on the stored data
1. Setting up the Azure Cosmos DB container for vector search
1. Authoring vector embedding policy
1. Adding vector indexes to the container indexing policy
1. Creating a container with vector indexes and vector embedding policy
1. Performing a vector search on the stored data

This guide walks through the process of creating vector data, indexing the data, and then querying the data in a container.

## Prerequisites

- An existing Azure Cosmos DB for NoSQL account.
- If you don't have an Azure subscription, [Try Azure Cosmos DB for NoSQL free](https://cosmos.azure.com/try/).
- If you have an existing Azure subscription, [create a new Azure Cosmos DB for NoSQL account](how-to-create-account.md).
Expand All @@ -38,10 +40,10 @@ This guide walks through the process of creating vector data, indexing the data,
Vector search for Azure Cosmos DB for NoSQL requires enabling the feature by completing the following steps:

1. Navigate to your Azure Cosmos DB for NoSQL resource page.
2. Select the "Features" pane under the "Settings" menu item.
3. Select for "Vector Search in Azure Cosmos DB for NoSQL."
4. Read the description of the feature to confirm you want to enable it.
5. Select "Enable" to turn on vector search in Azure Cosmos DB for NoSQL.
1. Select the "Features" pane under the "Settings" menu item.
1. Select for "Vector Search in Azure Cosmos DB for NoSQL."
1. Read the description of the feature to confirm you want to enable it.
1. Select "Enable" to turn on vector search in Azure Cosmos DB for NoSQL.

> [!TIP]
> Alternatively, use the Azure CLI to update the capabilities of your account to support NoSQL vector search.
Expand All @@ -53,20 +55,18 @@ Vector search for Azure Cosmos DB for NoSQL requires enabling the feature by com
> --capabilities EnableNoSQLVectorSearch
> ```
> [!NOTE]
> The registration request will be autoapproved; however, it may take 15 minutes to take effect.
## Understanding the steps involved in vector search
## Understanding the steps involved in vector search
Let’s take an example of creating a database for an internet-based bookstore and you're storing Title, Author, ISBN, and Description for each book. We also define two properties to contain vector embeddings. The first is the “contentVector” property, which contains [text embeddings](/azure/ai-services/openai/concepts/models#embeddings ) generated from the text content of the book (for example, concatenating the “title” “author” “isbn” and “description” properties before creating the embedding). The second is “coverImageVector”, which is generated from [images of the book’s cover](/azure/ai-services/computer-vision/concept-image-retrieval).
Let’s take an example of creating a database for an internet-based bookstore and you're storing Title, Author, ISBN, and Description for each book. We also define two properties to contain vector embeddings. The first is the “contentVector” property, which contains [text embeddings](/azure/ai-services/openai/concepts/models#embeddings ) generated from the text content of the book (for example, concatenating the “title” “author” “isbn” and “description” properties before creating the embedding). The second is “coverImageVector”, which is generated from [images of the book’s cover](/azure/ai-services/computer-vision/concept-image-retrieval).
1. Create and store vector embeddings for the fields on which you want to perform vector search.
2. Specify the vector embedding paths in the vector embedding policy.
3. Include any desired vector indexes in the indexing policy for the container.
1. Create and store vector embeddings for the fields on which you want to perform vector search.
1. Specify the vector embedding paths in the vector embedding policy.
1. Include any desired vector indexes in the indexing policy for the container.
For subsequent sections of this article, we consider the below structure for the items stored in our container:
For subsequent sections of this article, we consider the below structure for the items stored in our container:
```json
{
Expand All @@ -79,18 +79,19 @@ For subsequent sections of this article, we consider the below structure for the
}
```
## Creating a vector embedding policy for your container.
## Creating a vector embedding policy for your container

Next, you need to define a container vector policy. This policy provides information that is used to inform the Azure Cosmos DB query engine how to handle vector properties in the VectorDistance system functions. This policy also informs the vector indexing policy of necessary information, should you choose to specify one.
The following information is included in the contained vector policy:

* “path”: The property path that contains vectors 
* “datatype”: The type of the elements of the vector (default Float32) 
* “dimensions”: The length of each vector in the path (default 1536) 
* “distanceFunction”: The metric used to compute distance/similarity (default Cosine) 
- “path”: The property path that contains vectors 
- “datatype”: The type of the elements of the vector (default Float32) 
- “dimensions”: The length of each vector in the path (default 1536) 
- “distanceFunction”: The metric used to compute distance/similarity (default Cosine) 

For our example with book details, the vector policy can look like the example JSON below:
For our example with book details, the vector policy can look like the example JSON below:

```csharp
```csharp
Database db = await client.CreateDatabaseIfNotExistsAsync("vector-benchmarking");
List<Embedding> embeddings = new List<Embedding>()
{
Expand All @@ -109,13 +110,13 @@ For our example with book details, the vector policy can look like the example J
Dimensions = 10,
}
};
```
```

## Creating a vector index in the indexing policy

## Creating a vector index in the indexing policy
Once the vector embedding paths are decided, vector indexes need to be added to the indexing policy. Currently, the vector search feature for Azure Cosmos DB for NoSQL is supported only on new containers so you need to apply the vector policy during the time of container creation and it can’t be modified later. For this example, the indexing policy would look something like this:
Once the vector embedding paths are decided, vector indexes need to be added to the indexing policy. Currently, the vector search feature for Azure Cosmos DB for NoSQL is supported only on new containers so you need to apply the vector policy during the time of container creation and it can’t be modified later. For this example, the indexing policy would look something like this:

```csharp
```csharp
Collection<Embedding> collection = new Collection<Embedding>(embeddings);
ContainerProperties properties = new ContainerProperties(id: "vector-container", partitionKeyPath: "/id")
{
Expand All @@ -134,13 +135,14 @@ Once the vector embedding paths are decided, vector indexes need to be added to
};
properties.IndexingPolicy.IncludedPaths.Add(new IncludedPath { Path = "/*" });
properties.IndexingPolicy.ExcludedPaths.Add(new ExcludedPath { Path = "/vector/*" });
```
>[!IMPORTANT]
```

> [!IMPORTANT]
> The vector path added to the "excludedPaths" section of the indexing policy to ensure optimized performance for insertion. Not adding the vector path to "excludedPaths" will result in higher RU charge and latency for vector insertions.
## Running vector similarity search query
## Running vector similarity search query

Once you create a container with the desired vector policy, and insert vector data into the container, you can conduct a vector search using the [Vector Distance](query/vectordistance.md) system function in a query. Suppose you want to search for books about food recipes by looking at the description, you first need to get the embeddings for your query text. In this case, you might want to generate embeddings for the query text – “food recipe”. Once you have the embedding for your search query, you can use it in the VectorDistance function in the vector search query and get all the items that are similar to your query as shown here:
Once you create a container with the desired vector policy, and insert vector data into the container, you can conduct a vector search using the [Vector Distance](query/vectordistance.md) system function in a query. Suppose you want to search for books about food recipes by looking at the description, you first need to get the embeddings for your query text. In this case, you might want to generate embeddings for the query text – “food recipe”. Once you have the embedding for your search query, you can use it in the VectorDistance function in the vector search query and get all the items that are similar to your query as shown here:

```sql
SELECT TOP 10 c.title, VectorDistance(c.contentVector, [1,2,3,4,5,6,7,8,9,10]) AS SimilarityScore  
Expand All @@ -150,7 +152,7 @@ ORDER BY VectorDistance(c.contentVector, [1,2,3,4,5,6,7,8,9,10])  

This query retrieves the book titles along with similarity scores with respect to your query. Here's an example in .NET:

```csharp
```csharp
float[] embedding = {1f,2f,3f,4f,5f,6f,7f,8f,9f,10f};
var queryDef = new QueryDefinition(
query: $"SELECT c.title, VectorDistance(c.contentVector,@embedding) AS SimilarityScore FROM c ORDER BY VectorDistance(c.contentVector,@embedding)"
Expand All @@ -166,9 +168,10 @@ This query retrieves the book titles along with similarity scores with respect t
Console.WriteLine($"Found item:\t{item}");
}
}
```
```

## Related content

## Next steps
- [VectorDistance system function](query/vectordistance.md)
- [Vector indexing](../index-policy.md)
- [Setup Azure Cosmos DB for NoSQL for vector search](../vector-search.md).
Loading

0 comments on commit e253c01

Please sign in to comment.