feat: add redisearch vectorstore #1307

xinqiu · 2023-02-26T15:14:45Z

Description

Add RediSearch vectorstore for LangChain

How to use

from langchain.vectorstores.redisearch import RediSearch

rds = RediSearch.from_documents(docs, embeddings,redisearch_url="redis://localhost:6379")

hwchase17 · 2023-02-26T21:01:00Z

a jupyter notebook example would be very helpful!

tylerhutcherson · 2023-02-27T12:11:19Z

Thanks for opening this 💪🏼

My team at Redis is also taking a quick look. We might have some suggestions/optimizations but could always include those in a later PR!

xinqiu · 2023-02-27T16:56:33Z

a jupyter notebook example would be very helpful!

I had added jupyter example, please help review it

xinqiu · 2023-02-28T13:03:09Z

I fixed the lint problem.

Spartee

Love to see this. Thanks for doing this as this was definitely on @tylerhutcherson and I's list. I have a couple high level comments.

I think the name Redis would be better since people might assume RediSearch is a different thing. First part of the connection could use the INFO MODULES command to ensure that RediSearch is loaded. Another point is that we allow for vectors to be indexed in RedisJSON which would be confusing if people used the RediSearch container to use both modules without knowing they had to use the Redis-Stack container for this.
In relation, I think it would be nice to also have the ability to store as hash or JSON. This, I can do in a follow up PR though.
I left some performance pieces in there, but I'm happy to also do these as a follow up.
Lots of points on connection arguments. It would be ideal to match the semantics of the REdis-py library so that this is a similar drop in for our community.
I didn't see the addition of a optional package like pip install langchain[redis] which I think would be really nice. pinning a version to ensure these features are present is a must though.

Let me know what you think, we are happy to contribute some of these changes if you get stumped. Please reach out for clarification.

Awesome!

langchain/vectorstores/redisearch.py

Spartee · 2023-03-01T03:03:30Z

langchain/vectorstores/redisearch.py

+        # Prepare the Query
+        return_fields = ["metadata", "content", "vector_score"]
+        vector_field = "content_vector"
+        hybrid_fields = "*"


does the user have the option to change this here. I might be missing some logic but our users love this.

These fields are currently fixed, similar to ElasticVectorSearch.

/langchain/vectorstores/elastic_vector_search.py#L189

request = { "_op_type": "index", "_index": index_name, "vector": embeddings[i], "text": text, "metadata": metadata, }

Yea which is ok, but that's fixed for when metadata is a key. Redis allows for arbitrarily named keys to be passed in this argument.

Assuming the user created and used the index with langchain, this is totally fine, just thinking about use cases other than the typical. again, one thing we can contribute after the initial version too.

Spartee · 2023-03-01T03:04:08Z

langchain/vectorstores/redisearch.py

+        except ImportError:
+            raise ValueError(
+                "Could not import redis python package. "
+                "Please install it with `pip install redis`."


show pinned version? might be difficult unless it's a set constant. Slight maintenance burden.

I referred to other code under 'vectorstores', but no specific version was given in the error prompt.

langchain/vectorstores/redisearch.py

Spartee · 2023-03-01T03:06:59Z

langchain/vectorstores/redisearch.py

+            index_name = uuid.uuid4().hex
+        prefix = "doc"  # prefix for the document keys
+        distance_metric = (
+            "COSINE"  # distance metric for the vectors (ex. COSINE, IP, L2)


Allow this to be changed?

I think default values can be used here. The purpose of this code is to create the corresponding data structure in Redis.

This will work for now, but for many production use cases - users will need the ability to define and specify index name, prefix, and distance metrics. But that can come later as optional arguments.

@tylerhutcherson expressed the main point behind what I meant. users will want to use HNSW for larger indices and change distance metrics based on process and normalization. but for now, this is fine and awesome to see.

Spartee · 2023-03-01T03:07:12Z

langchain/vectorstores/redisearch.py

+        metadata = TextField(name="metadata")
+        content_embedding = VectorField(
+            "content_vector",
+            "FLAT",


allow this to be changed?

I think default values can be used here. The purpose of this code is to create the corresponding data structure in Redis.

xinqiu · 2023-03-01T15:38:25Z

Love to see this. Thanks for doing this as this was definitely on @tylerhutcherson and I's list. I have a couple high level comments.

I think the name Redis would be better since people might assume RediSearch is a different thing. First part of the connection could use the INFO MODULES command to ensure that RediSearch is loaded. Another point is that we allow for vectors to be indexed in RedisJSON which would be confusing if people used the RediSearch container to use both modules without knowing they had to use the Redis-Stack container for this.

In relation, I think it would be nice to also have the ability to store as hash or JSON. This, I can do in a follow up PR though.

I left some performance pieces in there, but I'm happy to also do these as a follow up.

Lots of points on connection arguments. It would be ideal to match the semantics of the REdis-py library so that this is a similar drop in for our community.

I didn't see the addition of a optional package like pip install langchain[redis] which I think would be really nice. pinning a version to ensure these features are present is a must though.

Let me know what you think, we are happy to contribute some of these changes if you get stumped. Please reach out for clarification.

Awesome!

Thank you for the several comments you made.I have made fixes for the majority of the comments.

Regarding the first point, I also initially thought the same way as you, not to use 'rediSearch'. However, considering that if the user directly uses a normal version of Redis and then uses this vectorstore, an error will occur, that's why it was changed to 'rediSearch'. The use of 'INFO MODULES' for checking, as you mentioned, is indeed a good point.

So do you think the following approach is more appropriate? If so, I will change it to 'Redis' as the file name and class name, looking forward to your re-review, thank you very much.

if "search" not in [m["name"] for m in client.info().get("modules")]:
    raise ValueError("Could not use redis directly, you need to add search module"
                     "Please refer [RediSearch](https://redis.io/docs/stack/search/quick_start/)"
                     )

langchain/vectorstores/redisearch.py

tylerhutcherson · 2023-03-01T20:53:39Z

@xinqiu Thanks for this! I just pulled and tested all of the code. Beyond what @Spartee shared for future improvements and the comment I made above, we think adding another integration test would be helpful here:

def test_redisearch_new_vector() -> None:
    """Test adding a new document"""
    texts = ["foo", "bar", "baz"]
    docsearch = RediSearch.from_texts(
        texts, FakeEmbeddings(), redisearch_url="redis://localhost:6379"
    )
    docsearch.add_texts(["foo"])
    output = docsearch.similarity_search("foo", k=2)
    assert output == [Document(page_content="foo"), Document(page_content="foo")]

Your suggestion about using the client.info().get("modules") cmd above is also good!

tylerhutcherson

Nice -- Overall this is a solid start! :) Our recommendation would still be to use Redis as the vector store name and enforce the module check when loading. @hwchase17 Anything else you need to see here? We're happy to collaborate on other extensions, iterations, examples, and docs.

xinqiu · 2023-03-03T15:22:12Z

Nice -- Overall this is a solid start! :) Our recommendation would still be to use Redis as the vector store name and enforce the module check when loading. @hwchase17 Anything else you need to see here? We're happy to collaborate on other extensions, iterations, examples, and docs.

OK,I will refactor this.

hwchase17 · 2023-03-04T15:54:42Z

this looks pretty good to me! will defer to you on naming @tylerhutcherson. what you mean regarding enforcing the module check? isnt there a check already?

tylerhutcherson · 2023-03-06T15:18:55Z

this looks pretty good to me! will defer to you on naming @tylerhutcherson. what you mean regarding enforcing the module check? isnt there a check already?

There is a check for the installation of the Redis Python client - that looks great! But we (cc @xinqiu, @Spartee) discussed also adding an additional check on class init to confirm the presence of the RediSearch "module" installed on the Redis instance.

xinqiu · 2023-03-06T16:03:07Z

@tylerhutcherson @Spartee Another issue with redis-py has been discovered. When trying to query based on index_name in client.ft(self.index_name).search(redis_query, params_dict), there is no restriction and it appears that Redis is currently returning full results of a match.

Same problem like this

tylerhutcherson · 2023-03-06T18:33:02Z

@tylerhutcherson @Spartee Another issue with redis-py has been discovered. When trying to query based on index_name in client.ft(self.index_name).search(redis_query, params_dict), there is no restriction and it appears that Redis is currently returning full results of a match.

Same problem like this

Interesting. This is not the redis Python library though, it's redisearch which is older. All search functionality has been ported to the official Redis library here. Possible that has something to do with it.

Also - worth noting that Redis associates keys to an index using the prefix argument. So any Redis HASH that is prefixed by prefix: will be included in the index. It's more flexible that way. So if you have docs that need to be in different indices, you use a different prefix for those keys.

xinqiu · 2023-03-08T15:55:40Z

I have renamed it as Redis.

hwchase17 · 2023-03-09T04:10:20Z

@tylerhutcherson any updated thoughts? am happy to help get this into tmrws release if its close

xinqiu · 2023-03-12T09:14:38Z

@hwchase17 All checks have passed, What else do I need to do?

tylerhutcherson · 2023-03-13T22:42:49Z

No other requirements from our end at the moment. But we can happily contribute another PR later with more examples, docs, and expanded functionality. Thanks @hwchase17 and @xinqiu !!!

hwchase17 · 2023-03-14T06:10:41Z

woo awesome! gonna slot this for wednesday release! thanks for the work @xinqiu and the review @tylerhutcherson !

@xinqiu do you have a twitter for a shoutout?

xinqiu · 2023-03-14T14:09:53Z

woo awesome! gonna slot this for wednesday release! thanks for the work @xinqiu and the review @tylerhutcherson !

@xinqiu do you have a twitter for a shoutout?

Great, my twitter is @xinqiu_bot

xinqiu added 2 commits February 26, 2023 22:46

feat: add redisearch vectorstore

e523f7f

feat: add_texts function

b520efb

feat: add redisearch demo jupyter notebook

d18957e

xinqiu changed the title ~~WIP: feat: add redisearch vectorstore~~ feat: add redisearch vectorstore Feb 27, 2023

feat: add unit test

7169eb1

xinqiu force-pushed the feat/add_vectorstore_redisearch branch from 96b4d34 to 7169eb1 Compare February 27, 2023 16:04

chore: format

5e224e9

Spartee reviewed Mar 1, 2023

View reviewed changes

chore: made fixs based on review feedback

f54ff1d

tylerhutcherson reviewed Mar 1, 2023

View reviewed changes

langchain/vectorstores/redisearch.py Outdated Show resolved Hide resolved

chore: made fixs based on review feedback

cfdcca8

xinqiu requested review from tylerhutcherson and Spartee and removed request for tylerhutcherson and Spartee March 2, 2023 01:42

tylerhutcherson reviewed Mar 2, 2023

View reviewed changes

chore: rename to redis

851446a

chore: format

f8fbdec

hwchase17 merged commit 4e13cef into langchain-ai:master Mar 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add redisearch vectorstore #1307

feat: add redisearch vectorstore #1307

xinqiu commented Feb 26, 2023 •

edited

Loading

hwchase17 commented Feb 26, 2023

tylerhutcherson commented Feb 27, 2023

xinqiu commented Feb 27, 2023

xinqiu commented Feb 28, 2023

Spartee left a comment

Spartee Mar 1, 2023

xinqiu Mar 1, 2023

Spartee Mar 2, 2023

Spartee Mar 1, 2023

xinqiu Mar 1, 2023

Spartee Mar 1, 2023

xinqiu Mar 1, 2023

tylerhutcherson Mar 1, 2023

Spartee Mar 2, 2023

Spartee Mar 1, 2023

xinqiu Mar 1, 2023

xinqiu commented Mar 1, 2023 •

edited

Loading

tylerhutcherson commented Mar 1, 2023

tylerhutcherson left a comment

xinqiu commented Mar 3, 2023

hwchase17 commented Mar 4, 2023

tylerhutcherson commented Mar 6, 2023

xinqiu commented Mar 6, 2023

tylerhutcherson commented Mar 6, 2023 •

edited

Loading

xinqiu commented Mar 8, 2023

hwchase17 commented Mar 9, 2023 •

edited

Loading

xinqiu commented Mar 12, 2023

tylerhutcherson commented Mar 13, 2023

hwchase17 commented Mar 14, 2023

xinqiu commented Mar 14, 2023

feat: add redisearch vectorstore #1307

feat: add redisearch vectorstore #1307

Conversation

xinqiu commented Feb 26, 2023 • edited Loading

Description

How to use

hwchase17 commented Feb 26, 2023

tylerhutcherson commented Feb 27, 2023

xinqiu commented Feb 27, 2023

xinqiu commented Feb 28, 2023

Spartee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xinqiu commented Mar 1, 2023 • edited Loading

tylerhutcherson commented Mar 1, 2023

tylerhutcherson left a comment

Choose a reason for hiding this comment

xinqiu commented Mar 3, 2023

hwchase17 commented Mar 4, 2023

tylerhutcherson commented Mar 6, 2023

xinqiu commented Mar 6, 2023

tylerhutcherson commented Mar 6, 2023 • edited Loading

xinqiu commented Mar 8, 2023

hwchase17 commented Mar 9, 2023 • edited Loading

xinqiu commented Mar 12, 2023

tylerhutcherson commented Mar 13, 2023

hwchase17 commented Mar 14, 2023

xinqiu commented Mar 14, 2023

xinqiu commented Feb 26, 2023 •

edited

Loading

xinqiu commented Mar 1, 2023 •

edited

Loading

tylerhutcherson commented Mar 6, 2023 •

edited

Loading

hwchase17 commented Mar 9, 2023 •

edited

Loading