Skip to content
This repository has been archived by the owner on Apr 15, 2024. It is now read-only.

use pdfminer to extract chapters from a book #238

Open
th0o0 opened this issue Oct 27, 2018 · 2 comments
Open

use pdfminer to extract chapters from a book #238

th0o0 opened this issue Oct 27, 2018 · 2 comments

Comments

@th0o0
Copy link

th0o0 commented Oct 27, 2018

I have a book(pdf format) maybe 3 chapters, I want to use pdfminer(other tools is ok as long as the tool can do that) to parse the book, so I can extract every chapter from the book, and save them as chapter one.txtchapter two.txtchapter three.txt.

How can I do that?

thanks.

@stud2008
Copy link

stud2008 commented Jan 8, 2020

I need too,Is the problem solved?

@hcharp
Copy link

hcharp commented Mar 18, 2022

I need it too...
For now I found how to extract titles, easy with the get_outlines() function
But I am currently thinking about how to now extract the text that is contained between two titles... Maybe by investigating in the code of that get_outlines() function?

ghmo2789 added a commit to kakann/pdfminer that referenced this issue Sep 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants