Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF Parse Workflow with Docling and Elastic Search #1137

Merged
merged 13 commits into from
Jan 2, 2025
Merged

Conversation

stangirala
Copy link
Collaborator

@stangirala stangirala commented Dec 23, 2024

Context

Create a new workflow example that uses Docling for the pdf parsing and Elastic Search as the embedding query engine. This example is similar to what we have an offers an OSS alternative.

What

  • Create a new workflow that uses Docling.
  • Create steps to write generated embeddings to ES.
  • Modify docker compose to start a local ES container to demonstrate the example.

Next Step

  • More of an FYI, the image.py object and the requirements file need to be brought in sync once the image builder setup is working.

Testing

  • Ran the workflow locally to test the write.
  • Ran the workflow with indexify-server.
  • Verified that the newly created ES indexes are query-able using embedding vector KNN.

Contribution Checklist

  • If the python-sdk was changed, please run make fmt in python-sdk/.
  • If the server was changed, please run make fmt in server/.
  • Make sure all PR Checks are passing.

@stangirala stangirala changed the title Pdf parse docling PDF Parse Workflow with Docling Dec 31, 2024
@stangirala stangirala changed the title PDF Parse Workflow with Docling PDF Parse Workflow with Docling and Elastic Search Dec 31, 2024
@stangirala stangirala merged commit 498dc94 into main Jan 2, 2025
@stangirala stangirala deleted the pdf-parse-docling branch January 2, 2025 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant