Skip to content

JBBalling/pdf-extraction

Repository files navigation

PDF Processing - German Dissertations

Tried different approaches to text extraction from PDF files. Yolov5 trained on DocLayNet dataset was giving the best results.

Processing took 926.5365602970123 seconds, for 25 (minfied) PDF and 1068 Pages. -> ~ 0,87 Seconds/Page

pip install -r requirements.txt
python src/main.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages