Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: PDF Parser miss text after OCR #4640

Open
1 task done
fengjac opened this issue Jan 25, 2025 · 0 comments
Open
1 task done

[Bug]: PDF Parser miss text after OCR #4640

fengjac opened this issue Jan 25, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@fengjac
Copy link

fengjac commented Jan 25, 2025

Is there an existing issue for the same bug?

  • I have checked the existing issues.

RAGFlow workspace code commit ID

3c2c894

RAGFlow image version

3c2c894

Other environment information

Windows 11 Pro
Python 3.10.16
pytorch 12.4

Actual behavior

I use this file

layout1.pdf

to test pdf_parser.py

And then I found that it has missed the word "rr" after OCR

As you see, my pdf file has rr like:

Image

After running self._image_ function, the boxes are like:

Image

It has missed the word "rr" after OCR

Expected behavior

No response

Steps to reproduce

Debug pdf_parser.py with layout1.pdf file(I have put it to Actual behavior) in vscode

Additional information

No response

@fengjac fengjac added the bug Something isn't working label Jan 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant