Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request]: Handle Splitting? #1055

Open
rehandaphedar opened this issue Oct 19, 2024 · 2 comments
Open

[Feature request]: Handle Splitting? #1055

rehandaphedar opened this issue Oct 19, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@rehandaphedar
Copy link

What do you need?

It would be great if fabric could automatically handle splitting/chunking for text that is too large for a given model.

From what I understand, this would need:

  • Information about the token limit of the model being used
  • A way to count the tokens in a specific request
  • A range of splitting options (character, word, sentence, recursive, semantic etc.) to choose from
  • Possibly a way to select the LLM used for semantic splitting
@rehandaphedar rehandaphedar added the enhancement New feature or request label Oct 19, 2024
@mattjoyce
Copy link
Contributor

I understand the concept of chunking, but how would it work in this environment?
I have a large file, and pipe to fabric -p summarize, it splits the file in various ways and summarizes each chunk and joins the results?

I'm sceptical about the efficacy and utility, but interested to hear your thoughts.

@rehandaphedar
Copy link
Author

it splits the file in various ways and summarizes each chunk and joins the results?

Yes. Not just for summarising though. I was thinking that the patterns could be modified so that they inform the LLM that the input given is part of a larger input, and possibly include the outputs to the previous chunks (In the case of for eg. sliding window approaches).

I'm not sure how difficult it would be to implement the advanced splitting options, for example those that include previous chunks' output as input to the next chunk. However, simple splitting should hopefully be easy to implement, and would be very helpful even with it's inefficiencies.

Regarding combining the outputs, I think even just concatenating them would still be very helpful. I'm not aware of how other programs do it though.

Regarding utility, handling chunking would be extremely useful, as one could for example pdftotxt book.pdf | fabric -p extract_wisdom and similar commands without worrying about token limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants