Skip to content

Latest commit

 

History

History
17 lines (9 loc) · 640 Bytes

README.md

File metadata and controls

17 lines (9 loc) · 640 Bytes

Docling_Colab

This repo contains google colab notebook for handing Docling for data extrcation such as text, image, table etc.

image

Docling

https://github.com/DS4SD/docling

https://ds4sd.github.io/docling/examples/


The colaboratory notebook shows how to access Docling for extraction of content from popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to HTML, Markdown and JSON (with embedded and referenced images).

Also show hybrid chuking using transformers, embedding and vector database.