-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get images and text in order as in PDF? #705
Comments
Without further investigation I don't think that is possible. |
you can use as blow
|
Careful here. There are objects of other types as well, so your Instead, you could iterate over all pages ( |
We can handle those errors, but order of the objects is very important for me, I'm scrapping PDF which is answer key of an exam, I want fetch the questions and answers from the PDF and store to DB, so Questions and options may be either text or image, so I need identify questions and it's answers from sequence of Objects Here I'm attaching sample document |
Description:
I want to extract the PDF then save text to db and image to storage, but the order matters, if i take page 1, when i get an image, i need to get text coming after that.
PDF input
PDF containing some text then images in each pages,
Expected output & actual output
I need to extract the image and text in order as in the PDF
How to do That ?
Code
Code I'm using for extracting the image, but text is not available here
The text was updated successfully, but these errors were encountered: