Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving chromadb Vector Database class #1146

Closed
wants to merge 2 commits into from

Conversation

aamirahmed2004
Copy link
Contributor

  • create() now uses get_collection() when the specified collection already exists (happens when using a persistent client). Before, when vector_db.create() was called by the AssistantKnowledge class, and self.exists() returns true, self._collection is never assigned.
  • defined upsert_available(), returns True as Chroma does support upsert.
  • updated logger messages to be more consistent with the messages in pgvector2.py
  • implemented clear() using Chroma's delete_collection() method

@jacobweiss2305 jacobweiss2305 self-requested a review September 22, 2024 12:40
@aamirahmed2004
Copy link
Contributor Author

aamirahmed2004 commented Sep 23, 2024

@jacobweiss2305 here are the steps to recreate the issue that I am trying to fix with the changes in my first bullet point. The other three bullet points are just quality of life/logical changes that I noticed in my use.

NOTE: when trying out these steps, you may or may not run into an issue like "No module named phi.vectordb.chroma"; in case you do, I have included my workaround at the end of this comment.

I did all my testing with the app in cookbook/llms/ollama/rag.

If you change assistant.py in cookbook/llms/ollama/rag in the following manner:
image
as well as include the import for the chroma vectordb.

Then run the docker container (for PgAssistantStorage), and run the app with streamlit run app.py. Try to upload a file, and it should work as intended. However, when you Ctrl+C (stop) the app and then rerun it, uploading a file now results in the following error in the logs:
image
and the vectorstore doesn't include any chunks from this second uploaded file, as can be seen from the below response (and the retrieved chunks, which were irrelevant so I didn't include them):
image

I suspect this is caused in phi/knowledge/base.py, in the load() method, when self.vectordb.create() is called, it doesn't assign the vector db's collection, since vector db's create() method only works when the exist() method returns False.

So my first bullet point above is intended to fix this issue with persistent Chroma clients. Please let me know if I am missing something and some parts need to be changed.

Problem I faced when trying this myself again (you may or may not have this issue, I don't know what causes it):

When working with the ChromaDb class, I could not get the import to work. I experimented with manually adding the path of the folder to the system path using the sys library, but nothing worked. In the end, I had to rename the directory at phi/vectordb/chroma to chroma_db, I changed the file name to chroma_db.py, and I modified the __init__.py accordingly, (VS code took care of all the import statements in assistant.py), which seemed to resolve the issue. I don't know if this is just for me or if the directory should be renamed in the repository as well!

Copy link
Contributor

@jacobweiss2305 jacobweiss2305 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you. Good PR!

@jacobweiss2305
Copy link
Contributor

#1164

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants