Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neo4j vector context retriever bug fix #14461

Conversation

theoneamendez
Copy link
Contributor

Description

Previously, when using the VectorContextRetriever with the Neo4jPropertyGraphStore, the VectorContextRetrieverwould call avector_query to get the similarity_top_k nodes and return the list of nodes with the first node being ranked with the highest similarity score, and the last being least similar. However, when we pass the nodes found from avector_query to aget_rel_map, the nodes returned could be in any order of similarity. So we need to fix this issue. For example, if I have a query: nodes = retriever.retrieve("Andres Mendez"), then then resulting node ids from avector_query could look like the following: ['Andres Mendez', 'Andres', 'Miguel Mendez', 'Joe Mamma']. But then the results from aget_rel_map could provide only the top 30 results from the relationships of Joe Mamma. This is not expected behaviour, as we are querying for Andres Mendez, not Joe Mamma. The issue here is that we don't take similarity score into account when when calling aget_rel_map, but we need to.

Fixes # (issue)
I modified the cypher query in aget_rel_map to return results based on the order of the node ids passed into this function. This will preserve the scores of the input nodes in the response of this function.

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

I also tested this by running different cases to make sure this works as expected.

  • I stared at the code and made sure it makes sense

Suggested Checklist:

  • I have performed a self-review of my own code
  • My changes generate no new warnings

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 28, 2024
if limit <= 0:
break
# Needs some optimization
response = self.structured_query(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we are making a db query per id -- this is extremely inefficient

Can't we use the "power of cypher" to force some ordering of the nodes to match the ordering of the IDs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I am not a cypher expert, but it seems like a single query should be possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified the code to only make one call to the DB and force ordering of the IDs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review the new query

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jul 2, 2024
Copy link
Collaborator

@logan-markewich logan-markewich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jul 2, 2024
@logan-markewich logan-markewich merged commit 722cb67 into run-llama:main Jul 2, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants