Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RedisVectorStoreDriver bugs #782

Merged
merged 3 commits into from
May 15, 2024
Merged

Conversation

dylanholmes
Copy link
Contributor

@dylanholmes dylanholmes commented May 15, 2024

Fixes: #778

Changes:

  • Edit RedisVectorStoreDriver:query to correctly set the meta, score, and vector fields of query result. This change fixes VectorQueryEngine issue with loading artifacts from query #778.
  • Edit RedisVectorStoreDriver:query to honor the namespace option.
  • Edit RedisVectorStoreDriver docs with an updated suggestion for creating the index. It should now have a namespace attribute with type TAG if you want to be able to use namespaces.
  • Added a few more unit tests.

Additional Manual Testing:

  1. Started redis locally: docker run -p 6379:6379 redislabs/redisearch:latest
  2. Created the index suggested in the updated docs: redis-cli FT.CREATE idx:griptape ON hash PREFIX 1 "griptape:" SCHEMA namespace TAG vector VECTOR FLAT 6 TYPE FLOAT32 DIM 1536 DISTANCE_METRIC COSINE
  3. Ran the following test program.
from dotenv import load_dotenv
from griptape.structures import Agent
from griptape.tools import VectorStoreClient, TaskMemoryClient
from griptape.loaders import WebLoader
from griptape.engines import VectorQueryEngine
from griptape.drivers import RedisVectorStoreDriver, OpenAiEmbeddingDriver, OpenAiChatPromptDriver

load_dotenv()

engine = VectorQueryEngine(
    prompt_driver=OpenAiChatPromptDriver(model="gpt-3.5-turbo"),
    vector_store_driver=RedisVectorStoreDriver(
        host="localhost",
        port=6379,
        password="",
        index="idx:griptape",
        embedding_driver=OpenAiEmbeddingDriver(),
    ),
)

engine.upsert_text_artifacts(
    WebLoader().load("https://www.griptape.ai"),
    namespace="griptape"
)

vector_db = VectorStoreClient(
    description="This DB has information about the Griptape Python framework",
    query_engine=engine,
    namespace="griptape"
)

agent = Agent(
    tools=[vector_db, TaskMemoryClient(off_prompt=False)]
)

agent.run(
    "what is Griptape?"
)

driver = RedisVectorStoreDriver(
    host="localhost",
    port=6379,
    password="",
    index="idx:griptape",
    embedding_driver=OpenAiEmbeddingDriver(),
)

def inspect_results(results):
    print(f"Result count: {len(results)}")
    if len(results) == 0:
        return
    print(f"{results[0].id}")
    print(f"{results[0].namespace}")
    print(f"{results[0].score=}")
    print(f"{type(results[0].meta)=}")
    print(f"{type(results[0].vector)=}")

print("\nWe should get 1 result:")
inspect_results(driver.query("What is griptape?"))
print("\nWe should get 1 result")
inspect_results(driver.query("What is griptape?", namespace="griptape"))
print("\nWe should get 0 results")
inspect_results(driver.query("What is griptape?", namespace="griptapez"))
print("\nThe result vector should be a list now:")
inspect_results(driver.query("What is griptape?", include_vectors=True))

Output:

[05/15/24 13:24:13] INFO     ToolkitTask f43e7105318e4cb1a0eb1176731c3ee3                                                  
                             Input: what is Griptape?                                                                      
[05/15/24 13:24:16] INFO     Subtask 8218e336fbb14f89a4ae0c1c55876ba8                                                      
                             Thought: To answer the question about what Griptape is, I will search the vector database that
                             contains information about the Griptape Python framework.                                     
                                                                                                                           
                             Actions:                                                                                      
                             ```json                                                                                       
                             [                                                                                             
                                 {                                                                                         
                                     "name": "VectorStoreClient",                                                          
                                     "path": "search",                                                                     
                                     "input": {                                                                            
                                         "values": {                                                                       
                                             "query": "What is Griptape?"                                                  
                                         }                                                                                 
                                     },                                                                                    
                                     "tag": "search_griptape"                                                              
                                 }                                                                                         
                             ]                                                                                             
                             ```                                                                                           
[05/15/24 13:24:20] INFO     Subtask 8218e336fbb14f89a4ae0c1c55876ba8                                                      
                             Response: Output of "VectorStoreClient.search" was stored in memory with memory_name          
                             "TaskMemory" and artifact_namespace "da3b7f794e3840d1afe453f10ef695da"                        
[05/15/24 13:24:22] INFO     Subtask 9696727238ae469b820436eca1da00ef                                                      
                             Thought: The output of the search has been stored in memory. I will now query the memory to   
                             retrieve the information about Griptape.                                                      
                             Actions: [{"name": "TaskMemoryClient", "path": "query", "input": {"values": {"memory_name":   
                             "TaskMemory", "artifact_namespace": "da3b7f794e3840d1afe453f10ef695da", "query": "What is     
                             Griptape?"}}, "tag": "retrieve_griptape_info"}]                                               
[05/15/24 13:24:24] INFO     Subtask 9696727238ae469b820436eca1da00ef                                                      
                             Response: Griptape is a framework that enables developers to build, deploy, and scale         
                             retrieval-driven AI-powered applications in the cloud. It provides a comprehensive suite of   
                             tools, including a development framework and execution runtime, to facilitate the creation of 
                             these applications. Griptape allows developers to use predictable, programmable Python for    
                             building business logic, enhancing security, performance, and cost-efficiency through         
                             off-prompt functionality. It also simplifies the deployment and management of ETL, RAG, and   
                             other structures by offering simple API abstractions and eliminating the need for             
                             infrastructure management, while allowing for seamless scaling to meet growing workload       
                             demands.                                                                                      
[05/15/24 13:24:27] INFO     ToolkitTask f43e7105318e4cb1a0eb1176731c3ee3                                                  
                             Output: Griptape is a framework that enables developers to build, deploy, and scale           
                             retrieval-driven AI-powered applications in the cloud. It provides a comprehensive suite of   
                             tools, including a development framework and execution runtime, to facilitate the creation of 
                             these applications. Griptape allows developers to use predictable, programmable Python for    
                             building business logic, enhancing security, performance, and cost-efficiency through         
                             off-prompt functionality. It also simplifies the deployment and management of ETL, RAG, and   
                             other structures by offering simple API abstractions and eliminating the need for             
                             infrastructure management, while allowing for seamless scaling to meet growing workload       
                             demands.                                                                                      

We should get 1 result:
Result count: 1
7e01516532af4d91b885a13ddecf9aa6
griptape
results[0].score=0.495465040207
type(results[0].meta)=<class 'dict'>
type(results[0].vector)=<class 'NoneType'>

We should get 1 result
Result count: 1
7e01516532af4d91b885a13ddecf9aa6
griptape
results[0].score=0.495521783829
type(results[0].meta)=<class 'dict'>
type(results[0].vector)=<class 'NoneType'>

We should get 0 results
Result count: 0

The result vector should be a list now:
Result count: 1
7e01516532af4d91b885a13ddecf9aa6
griptape
results[0].score=0.495521783829
type(results[0].meta)=<class 'dict'>
type(results[0].vector)=<class 'list'>

📚 Documentation preview 📚: https://griptape--782.org.readthedocs.build//782/

@@ -120,8 +123,9 @@ def query(

vector = self.embedding_driver.embed_string(query)

filter_expression = f"(@namespace:{{{namespace}}})" if namespace else "*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the namespace field doesn't exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the namespace does not exist in the index schema, then passing namespace to query will never find any results, including when it belongs to a tool.

So in my example above, driver.query("What is griptape?", namespace="griptape") would produce 0 results instead of 1 and the VectorStoreClient tool (configured with a namespace) will never find any results.

I think we might be able to inspect the schema to either change this behavior or provide a warning to prevent people from shooting their feet. Not every query, on initialization. I'll see what I can do. Let me know if you have other ideas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. So, when no namespace parameter is specified and we add the @namespace:* filter, will it return empty results as well? If so, we should probably not include that filter, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no namespace is provided, filter_expression will evaluate to '*' rather than @namespace:* (there's a ternary). Providing no namespace will search through all vectors in the index just as before.

This is illustrated in the example program via driver.query("What is griptape?"), which returns a result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I misread the code, sorry! Let's ship it!

@dylanholmes dylanholmes merged commit 3d4c581 into dev May 15, 2024
7 checks passed
@dylanholmes dylanholmes deleted the fix/redis-vector-store-driver branch May 15, 2024 18:37
hkhajgiwale pushed a commit to hkhajgiwale/griptape that referenced this pull request May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VectorQueryEngine issue with loading artifacts from query
2 participants