Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Support arbitrary python functions to determine document split points #194

Open
randerzander opened this issue Oct 25, 2024 · 0 comments
Labels
feature request New feature or request

Comments

@randerzander
Copy link
Collaborator

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Currently preventing usage

Please provide a clear description of problem this feature solves

In an ideal world we'd have cleanly extracted document section header metadata including location.

Then we could use the location of such document demarcations to support splitting on those demarcations.

However, some separators are arbitrary text content not likely to ever be identified by an ootb model. As a result, users would like to be able to run an arbitrary python function which can return split locations.

Describe the feature, and optionally a solution or implementation and any alternatives

See above

Additional context

No response

@randerzander randerzander added the feature request New feature or request label Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant