Skip to content

Commit 818616f

Browse files
Merge pull request #1959 from redis/DOC-5534-js-vec-set-examples
DOC-5534 JS vector sets examples
2 parents df12352 + 1cac420 commit 818616f

File tree

3 files changed

+346
-1
lines changed

3 files changed

+346
-1
lines changed

content/develop/clients/nodejs/transpipe.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ categories:
1212
description: Learn how to use Redis pipelines and transactions
1313
linkTitle: Pipelines/transactions
1414
title: Pipelines and transactions
15-
weight: 4
15+
weight: 5
1616
---
1717

1818
Redis lets you send a sequence of commands to the server together in a batch.
Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
---
2+
categories:
3+
- docs
4+
- develop
5+
- stack
6+
- oss
7+
- rs
8+
- rc
9+
- oss
10+
- kubernetes
11+
- clients
12+
description: Index and query embeddings with Redis vector sets
13+
linkTitle: Vector set embeddings
14+
title: Vector set embeddings
15+
weight: 4
16+
bannerText: Vector set is a new data type that is currently in preview and may be subject to change.
17+
bannerChildren: true
18+
---
19+
20+
A Redis [vector set]({{< relref "/develop/data-types/vector-sets" >}}) lets
21+
you store a set of unique keys, each with its own associated vector.
22+
You can then retrieve keys from the set according to the similarity between
23+
their stored vectors and a query vector that you specify.
24+
25+
You can use vector sets to store any type of numeric vector but they are
26+
particularly optimized to work with text embedding vectors (see
27+
[Redis for AI]({{< relref "/develop/ai" >}}) to learn more about text
28+
embeddings). The example below shows how to use the
29+
[`@xenova/transformers`](https://www.npmjs.com/package/@xenova/transformers)
30+
library to generate vector embeddings and then
31+
store and retrieve them using a vector set with `node-redis`.
32+
33+
## Initialize
34+
35+
Start by [installing]({{< relref "/develop/clients/nodejs#install" >}}) `node-redis`
36+
if you haven't already done so. Also, install `@xenova/transformers`:
37+
38+
```bash
39+
npm install @xenova/transformers
40+
```
41+
42+
In your JavaScript source file, import the required classes:
43+
44+
{{< clients-example set="home_vecsets" step="import" lang_filter="Node.js" >}}
45+
{{< /clients-example >}}
46+
47+
The first of these imports is the
48+
`@xenova/transformers` class, which generates an embedding from a section of text.
49+
This example uses `transformers.pipeline` with the
50+
[`all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
51+
model for the embeddings. This model generates vectors with 384 dimensions, regardless
52+
of the length of the input text, but note that the input is truncated to 256
53+
tokens (see
54+
[Word piece tokenization](https://huggingface.co/learn/nlp-course/en/chapter6/6)
55+
at the [Hugging Face](https://huggingface.co/) docs to learn more about the way tokens
56+
are related to the original text).
57+
58+
The output from `transformers.pipeline` is a function (called `pipe` in the examples)
59+
that you can call to generate embeddings. The `pipeOptions` object is a parameter for
60+
`pipe` that specifies how to generate sentence embeddings from token embeddings (see the
61+
[`all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
62+
documentation for details).
63+
64+
{{< clients-example set="home_vecsets" step="model" lang_filter="Node.js" >}}
65+
{{< /clients-example >}}
66+
67+
## Create the data
68+
69+
The example data is contained in an object with some brief
70+
descriptions of famous people:
71+
72+
{{< clients-example set="home_vecsets" step="data" lang_filter="Node.js" >}}
73+
{{< /clients-example >}}
74+
75+
## Add the data to a vector set
76+
77+
The next step is to connect to Redis and add the data to a new vector set.
78+
79+
The code below iterates through all the key-value pairs in the `peopleData` object
80+
and adds corresponding elements to a vector set called `famousPeople`.
81+
82+
Use the `pipe()` function created above to generate the
83+
embedding and then use `Array.from()` to convert the embedding to an array
84+
of `float32` values that you can pass to the
85+
[`vAdd()`]({{< relref "/commands/vadd" >}}) command to set the embedding.
86+
87+
The call to `vAdd()` also adds the `born` and `died` values from the
88+
`peopleData` object as attribute data. You can access this during a query
89+
or by using the [`vGetAttr()`]({{< relref "/commands/vgetattr" >}}) method.
90+
91+
{{< clients-example set="home_vecsets" step="add_data" lang_filter="Node.js" >}}
92+
{{< /clients-example >}}
93+
94+
## Query the vector set
95+
96+
You can now query the data in the set. The basic approach is to use the
97+
`pipe()` function to generate another embedding vector for the query text.
98+
(This is the same method used to add the elements to the set.) Then, pass
99+
the query vector to [`vSim()`]({{< relref "/commands/vsim" >}}) to return elements
100+
of the set, ranked in order of similarity to the query.
101+
102+
Start with a simple query for "actors":
103+
104+
{{< clients-example set="home_vecsets" step="basic_query" lang_filter="Node.js" >}}
105+
{{< /clients-example >}}
106+
107+
This returns the following list of elements (formatted slightly for clarity):
108+
109+
```
110+
'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
111+
"Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
112+
"Marie Curie","Paul Erdos"]
113+
```
114+
115+
The first two people in the list are the two actors, as expected, but none of the
116+
people from Linus Pauling onward was especially well-known for acting (and there certainly
117+
isn't any information about that in the short description text).
118+
As it stands, the search attempts to rank all the elements in the set, based
119+
on the information contained in the embedding model.
120+
You can use the `COUNT` parameter of `vSim()` to limit the list of elements
121+
to just the most relevant few items:
122+
123+
{{< clients-example set="home_vecsets" step="limited_query" lang_filter="Node.js" >}}
124+
{{< /clients-example >}}
125+
126+
The reason for using text embeddings rather than simple text search
127+
is that the embeddings represent semantic information. This allows a query
128+
to find elements with a similar meaning even if the text is
129+
different. For example, the word "entertainer" doesn't appear in any of the
130+
descriptions but if you use it as a query, the actors and musicians are ranked
131+
highest in the results list:
132+
133+
{{< clients-example set="home_vecsets" step="entertainer_query" lang_filter="Node.js" >}}
134+
{{< /clients-example >}}
135+
136+
Similarly, if you use "science" as a query, you get the following results:
137+
138+
```
139+
'science': ["Linus Pauling","Marie Curie","Maryam Mirzakhani","Paul Erdos",
140+
"Marie Fredriksson","Masako Natsume","Freddie Mercury","Chaim Topol"]
141+
```
142+
143+
The scientists are ranked highest but they are then followed by the
144+
mathematicians. This seems reasonable given the connection between mathematics
145+
and science.
146+
147+
You can also use
148+
[filter expressions]({{< relref "/develop/data-types/vector-sets/filtered-search" >}})
149+
with `vSim()` to restrict the search further. For example,
150+
repeat the "science" query, but this time limit the results to people
151+
who died before the year 2000:
152+
153+
{{< clients-example set="home_vecsets" step="filtered_query" lang_filter="Node.js" >}}
154+
{{< /clients-example >}}
155+
156+
Note that the boolean filter expression is applied to items in the list
157+
before the vector distance calculation is performed. Items that don't
158+
pass the filter test are removed from the results completely, rather
159+
than just reduced in rank. This can help to improve the performance of the
160+
search because there is no need to calculate the vector distance for
161+
elements that have already been filtered out of the search.
162+
163+
## More information
164+
165+
See the [vector sets]({{< relref "/develop/data-types/vector-sets" >}})
166+
docs for more information and code examples. See the
167+
[Redis for AI]({{< relref "/develop/ai" >}}) section for more details
168+
about text embeddings and other AI techniques you can use with Redis.
169+
170+
You may also be interested in
171+
[vector search]({{< relref "/develop/clients/nodejs/vecsearch" >}}).
172+
This is a feature of the
173+
[Redis query engine]({{< relref "/develop/ai/search-and-query" >}})
174+
that lets you retrieve
175+
[JSON]({{< relref "/develop/data-types/json" >}}) and
176+
[hash]({{< relref "/develop/data-types/hashes" >}}) documents based on
177+
vector data stored in their fields.
Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
// EXAMPLE: home_vecsets
2+
// STEP_START import
3+
import * as transformers from '@xenova/transformers';
4+
import { createClient } from 'redis';
5+
// STEP_END
6+
7+
// STEP_START model
8+
const pipe = await transformers.pipeline(
9+
'feature-extraction', 'Xenova/all-MiniLM-L6-v2'
10+
);
11+
12+
const pipeOptions = {
13+
pooling: 'mean',
14+
normalize: true,
15+
};
16+
// STEP_END
17+
18+
// STEP_START data
19+
const peopleData = {
20+
"Marie Curie": {
21+
"born": 1867, "died": 1934,
22+
"description": `
23+
Polish-French chemist and physicist. The only person ever to win
24+
two Nobel prizes for two different sciences.
25+
`
26+
},
27+
"Linus Pauling": {
28+
"born": 1901, "died": 1994,
29+
"description": `
30+
American chemist and peace activist. One of only two people to win two
31+
Nobel prizes in different fields (chemistry and peace).
32+
`
33+
},
34+
"Freddie Mercury": {
35+
"born": 1946, "died": 1991,
36+
"description": `
37+
British musician, best known as the lead singer of the rock band
38+
Queen.
39+
`
40+
},
41+
"Marie Fredriksson": {
42+
"born": 1958, "died": 2019,
43+
"description": `
44+
Swedish multi-instrumentalist, mainly known as the lead singer and
45+
keyboardist of the band Roxette.
46+
`
47+
},
48+
"Paul Erdos": {
49+
"born": 1913, "died": 1996,
50+
"description": `
51+
Hungarian mathematician, known for his eccentric personality almost
52+
as much as his contributions to many different fields of mathematics.
53+
`
54+
},
55+
"Maryam Mirzakhani": {
56+
"born": 1977, "died": 2017,
57+
"description": `
58+
Iranian mathematician. The first woman ever to win the Fields medal
59+
for her contributions to mathematics.
60+
`
61+
},
62+
"Masako Natsume": {
63+
"born": 1957, "died": 1985,
64+
"description": `
65+
Japanese actress. She was very famous in Japan but was primarily
66+
known elsewhere in the world for her portrayal of Tripitaka in the
67+
TV series Monkey.
68+
`
69+
},
70+
"Chaim Topol": {
71+
"born": 1935, "died": 2023,
72+
"description": `
73+
Israeli actor and singer, usually credited simply as 'Topol'. He was
74+
best known for his many appearances as Tevye in the musical Fiddler
75+
on the Roof.
76+
`
77+
}
78+
};
79+
// STEP_END
80+
81+
// STEP_START add_data
82+
const client = createClient({ url: 'redis://localhost:6379' });
83+
84+
client.on('error', err => console.log('Redis Client Error', err));
85+
await client.connect();
86+
87+
for (const [name, details] of Object.entries(peopleData)) {
88+
const embedding = await pipe(details.description, pipeOptions);
89+
const embeddingArray = Array.from(embedding.data);
90+
91+
await client.vAdd('famousPeople', embeddingArray, name);
92+
await client.vSetAttr('famousPeople', name, JSON.stringify({
93+
born: details.born,
94+
died: details.died
95+
}));
96+
}
97+
// STEP_END
98+
99+
// STEP_START basic_query
100+
const queryValue = "actors";
101+
102+
const queryEmbedding = await pipe(queryValue, pipeOptions);
103+
const queryArray = Array.from(queryEmbedding.data);
104+
105+
const actorsResults = await client.vSim('famousPeople', queryArray);
106+
107+
console.log(`'actors': ${JSON.stringify(actorsResults)}`);
108+
// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
109+
// "Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
110+
// "Marie Curie","Paul Erdos"]
111+
// STEP_END
112+
113+
// STEP_START limited_query
114+
const queryValue2 = "actors";
115+
116+
const queryEmbedding2 = await pipe(queryValue2, pipeOptions);
117+
const queryArray2 = Array.from(queryEmbedding2.data);
118+
119+
const twoActorsResults = await client.vSim('famousPeople', queryArray2, {
120+
COUNT: 2
121+
});
122+
123+
console.log(`'actors (2)': ${JSON.stringify(twoActorsResults)}`);
124+
// >>> 'actors (2)': ["Masako Natsume","Chaim Topol"]
125+
// STEP_END
126+
127+
// STEP_START entertainer_query
128+
const queryValue3 = "entertainer";
129+
130+
const queryEmbedding3 = await pipe(queryValue3, pipeOptions);
131+
const queryArray3 = Array.from(queryEmbedding3.data);
132+
133+
const entertainerResults = await client.vSim('famousPeople', queryArray3);
134+
135+
console.log(`'entertainer': ${JSON.stringify(entertainerResults)}`);
136+
// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
137+
// "Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
138+
// "Marie Curie","Paul Erdos"]
139+
// STEP_END
140+
141+
const queryValue4 = "science";
142+
143+
const queryEmbedding4 = await pipe(queryValue4, pipeOptions);
144+
const queryArray4 = Array.from(queryEmbedding4.data);
145+
146+
const scienceResults = await client.vSim('famousPeople', queryArray4);
147+
148+
console.log(`'science': ${JSON.stringify(scienceResults)}`);
149+
// >>> 'science': ["Linus Pauling","Marie Curie","Maryam Mirzakhani",
150+
// "Paul Erdos","Marie Fredriksson","Masako Natsume","Freddie Mercury",
151+
// "Chaim Topol"]
152+
153+
// STEP_START filtered_query
154+
const queryValue5 = "science";
155+
156+
const queryEmbedding5 = await pipe(queryValue5, pipeOptions);
157+
const queryArray5 = Array.from(queryEmbedding5.data);
158+
159+
const science2000Results = await client.vSim('famousPeople', queryArray5, {
160+
FILTER: '.died < 2000'
161+
});
162+
163+
console.log(`'science2000': ${JSON.stringify(science2000Results)}`);
164+
// >>> 'science2000': ["Linus Pauling","Marie Curie","Paul Erdos",
165+
// "Masako Natsume","Freddie Mercury"]
166+
// STEP_END
167+
168+
await client.quit();

0 commit comments

Comments
 (0)