Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Paragraph Roles to the Data Ingestion Process #74

Open
placerda opened this issue May 22, 2024 · 0 comments
Open

Add Paragraph Roles to the Data Ingestion Process #74

placerda opened this issue May 22, 2024 · 0 comments
Assignees

Comments

@placerda
Copy link
Collaborator

placerda commented May 22, 2024

Goal:
Document Intelligence provides paragraph roles information like heading, we will use this information to create better chunks.

How it will work:
[ X ] Change document intelligence results in HTML format, according to paragraph roles.
[ X ] Combine results with tables before chunking.
[ ] Update chunk logic, according to file format.
[ ] Embedding for each chunk part

Impact:
Get the chunk more meaningful when we generate it from a paragraph in doc intelligence response.

@placerda placerda changed the title Data Ingestion Improvement (Paragraph Roles) Add Paragraph Roles to the Data Ingestion Process May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants