[Bug]: PDF Parser miss text after OCR #4640

fengjac · 2025-01-25T02:39:06Z

Windows 11 Pro
Python 3.10.16
pytorch 12.4

I use this file

to test pdf_parser.py

And then I found that it has missed the word "rr" after OCR

As you see, my pdf file has rr like:

After running self._image_ function, the boxes are like:

It has missed the word "rr" after OCR

No response

Debug pdf_parser.py with layout1.pdf file(I have put it to Actual behavior) in vscode

No response

The text was updated successfully, but these errors were encountered:

fengjac added the bug Something isn't working label Jan 25, 2025

Provide feedback