Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added util functions to get nodes of different depth from a hierarchical nodes list #10535

Merged
merged 4 commits into from
Feb 17, 2024

Conversation

qiisziilbash
Copy link
Contributor

Description

Added util functions to get nodes of different depth from a hierarchical nodes list.

When one creates a hierarchical chunking especially with a DB backed vector store, one cannot create an index with nodes of a given chunk easily. This utils enables users to get nodes with different depth which correspond to different chunk sizes and add the chunk size as metadata to the nodes and filter nodes later based on this metadata.

This can be achieved in a different way as well; which is by adding chunk_size of the node_parser to the nodes as metadata when creating the nodes; however, it seems more reasonable to provide utils that enables users to achieve this themselves than making nodes bulkier for every user that might not need this metadata.

Type of Change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

  • Added new unit/integration tests

Suggested Checklist:

  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran make format; make lint to appease the lint gods

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 8, 2024
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 9, 2024
@logan-markewich
Copy link
Collaborator

@qiisziilbash if you can fix up the CI errors, we can get this merged

Seems you may need to update the __init__.py or update the import path in the tests

ImportError: cannot import name 'get_deeper_nodes' from 'llama_index.node_parser' (/home/runner/work/llama_index/llama_index/llama_index/node_parser/__init__.py)

@qiisziilbash
Copy link
Contributor Author

@qiisziilbash if you can fix up the CI errors, we can get this merged

Seems you may need to update the __init__.py or update the import path in the tests

ImportError: cannot import name 'get_deeper_nodes' from 'llama_index.node_parser' (/home/runner/work/llama_index/llama_index/llama_index/node_parser/__init__.py)

Good catch; done!

@qiisziilbash
Copy link
Contributor Author

@logan-markewich feel free to merge this; soes not seem like I have permission to do so

@logan-markewich
Copy link
Collaborator

@qiisziilbash apologies, with v0.10.0 the project structure has changed slightly. Was focused on getting that landed.

I can take a stab at porting this PR in a bit, if you don't get to it first :)

@qiisziilbash
Copy link
Contributor Author

@logan-markewich this should be good to go in

@logan-markewich
Copy link
Collaborator

(ugh, merging main locally brought in extra changes, all good though, they will get merged anyways lol)

@logan-markewich logan-markewich enabled auto-merge (squash) February 17, 2024 07:05
@logan-markewich logan-markewich merged commit b91ee5c into run-llama:main Feb 17, 2024
8 checks passed
Dominastorm pushed a commit to uptrain-ai/llama_index that referenced this pull request Feb 28, 2024
anoopshrma pushed a commit to anoopshrma/llama_index that referenced this pull request Mar 2, 2024
Izukimat pushed a commit to Izukimat/llama_index that referenced this pull request Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants