Vector similarity searching is a technique used to find and retrieve data points that are similar to a given query, based on their vector representations. The similarity between two vectors is usually measured using a distance metric, such as cosine similarity or Euclidean distance. These metrics quantify the similarity between two vectors by calculating the angle between them or the distance between their coordinates in the vector space.
Vector similarity searching has numerous applications, such as recommendation systems, search engines, image and video retrieval, and natural language processing tasks. It allows for efficient and accurate retrieval of similar items, enabling users to find relevant information or discover related items quickly and effectively.
The VECTOR_DISTANCE function is a new feature of the Azure SQL Database that can calculate the distance between two vectors enabling similarity searching right in the database.
The syntax is as follows:
VECTOR_DISTANCE ( distance_metric, vector1, vector2 )
You will be using this function in some upcoming samples as well as in the RAG chat application; both utilizing the vectors you just created for the Products table.
-
The first query will pose the question "I am looking for a red bike and I dont want to spend a lot". The key words that should help with our similarity search are red, bike, and dont want to spend a lot. Run the following SQL in a blank query editor in VS Code:
declare @search_text nvarchar(max) = 'I am looking for a red bike and I dont want to spend a lot' declare @search_vector vector(1536) exec dbo.create_embeddings @search_text, @search_vector output; SELECT TOP(4) p.ProductID, p.Name , p.chunk, vector_distance('cosine', @search_vector, p.embeddings) AS distance FROM [SalesLT].[Product] p ORDER BY distance
And you can see from the results, the search found exactly that, an affordable red bike. The distance column shows us how similar it found the results to be using VECTOR_DISTANCE, with a lower score being a better match.
Name chunk distance Road-650 Red, 48 Road-650 Red, 48 Red Road Bikes Road-650 Value-priced bike with many features of our top-of-the-line models. Has the same light, stiff frame, and the quick acceleration we're famous for. 0.16352240013483477 Road-650 Red, 60 Road-650 Red, 60 Red Road Bikes Road-650 Value-priced bike with many features of our top-of-the-line models. Has the same light, stiff frame, and the quick acceleration we're famous for. 0.16361482158949225 Road-650 Red, 58 Road-650 Red, 58 Red Road Bikes Road-650 Value-priced bike with many features of our top-of-the-line models. Has the same light, stiff frame, and the quick acceleration we're famous for. 0.16432339626539993 Road-650 Red, 44 Road-650 Red, 44 Red Road Bikes Road-650 Value-priced bike with many features of our top-of-the-line models. Has the same light, stiff frame, and the quick acceleration we're famous for. 0.1652894865541471 -
The next search will be looking for a safe lightweight helmet. Run the following SQL in a blank query editor in VS Code:
declare @search_text nvarchar(max) = 'I am looking for a safe helmet that does not weigh much' declare @search_vector vector(1536) exec dbo.create_embeddings @search_text, @search_vector output; SELECT TOP(4) p.ProductID, p.Name , p.chunk, vector_distance('cosine', @search_vector, p.embeddings) AS distance FROM [SalesLT].[Product] p ORDER BY distance
With the results returning lightweight helmets. There is one result that is not a helmet but a vest but as you can see, the distance score is higher for this result than the 3 helmet scores.
Name chunk distance Sport-100 Helmet, Black Sport-100 Helmet, Black Black Helmets Sport-100 Universal fit, well-vented, lightweight , snap-on visor. 0.1641856735683479 Sport-100 Helmet, Red Sport-100 Helmet, Red Red Helmets Sport-100 Universal fit, well-vented, lightweight , snap-on visor. 0.16508593401632166 Sport-100 Helmet, Blue Sport-100 Helmet, Blue Blue Helmets Sport-100 Universal fit, well-vented, lightweight , snap-on visor. 0.16592580751312624 Classic Vest, S Classic Vest, S Blue Vests Classic Vest Light-weight, wind-resistant, packs to fit into a pocket. 0.19888204151269384 -
In the previous 2 examples, we were clear on what we were looking for; cheap red bike, light helmet. In this next example, we are going to have it flex its AI muscles a bit by saying we want a bike seat that needs to be good on trails. This will require the search to look for adjacent values that have something in common with trails. Run the following SQL in a blank query editor in VS Code to see the results.
declare @search_text nvarchar(max) = 'Do you sell any padded seats that are good on trails?' declare @search_vector vector(1536) exec dbo.create_embeddings @search_text, @search_vector output; SELECT TOP(4) p.ProductID, p.Name , p.chunk, vector_distance('cosine', @search_vector, p.embeddings) AS distance FROM [SalesLT].[Product] p ORDER BY distance
These results are very interesting for it found products based on word meanings such as absorb shocks and bumps and foam-padded. It was able to make connections to riding conditions on trails and find products that would fit that need.
Name chunk distance ML Mountain Seat/Saddle ML Mountain Seat/Saddle No Color Saddles ML Mountain Seat/Saddle 2 Designed to absorb shock. 0.17265341238606102 LL Road Seat/Saddle LL Road Seat/Saddle No Color Saddles LL Road Seat/Saddle 1 Lightweight foam-padded saddle. 0.17667274723850412 ML Road Seat/Saddle ML Road Seat/Saddle No Color Saddles ML Road Seat/Saddle 2 Rubber bumpers absorb bumps. 0.18802953111711573 HL Mountain Seat/Saddle HL Mountain Seat/Saddle No Color Saddles HL Mountain Seat/Saddle 2 Anatomic design for a full-day of riding in comfort. Durable leather. 0.18931317298732764