This repo contains google colab notebook for handing Docling for data extrcation such as text, image, table etc.
https://github.com/DS4SD/docling
https://ds4sd.github.io/docling/examples/
The colaboratory notebook shows how to access Docling for extraction of content from popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to HTML, Markdown and JSON (with embedded and referenced images).
Also show hybrid chuking using transformers, embedding and vector database.