Tried different approaches to text extraction from PDF files. Yolov5 trained on DocLayNet dataset was giving the best results.
Processing took 926.5365602970123 seconds, for 25 (minfied) PDF and 1068 Pages. -> ~ 0,87 Seconds/Page
pip install -r requirements.txt
python src/main.py