Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INVOICE PDF #1

Open
Anbu8968 opened this issue Mar 15, 2024 · 1 comment
Open

INVOICE PDF #1

Anbu8968 opened this issue Mar 15, 2024 · 1 comment

Comments

@Anbu8968
Copy link

I'm new to using AI, and I'm looking for guidance on how to extract invoice details from PDF files, similar to how it's done for images. Can you provide some suggestions or steps to achieve this?
Thanks in advance.

@whoatharva
Copy link

I'm new to using AI, and I'm looking for guidance on how to extract invoice details from PDF files, similar to how it's done for images. Can you provide some suggestions or steps to achieve this? Thanks in advance.

The PyPDF2 library is one of the ways you can get text from a PDF without using OCR, as it enables you to read and extract text from each page of non-image based PDF. Where one cannot directly extract texts in case of an image-based PDF, OCR (Optical Character Recognition) may be employed through pytesseract, alongside pdf2image that converts pdf pages to images so as to extract texts out of them instead. So, this method covers both scanned and textual PDFs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants