use pdfminer to extract chapters from a book #238

th0o0 · 2018-10-27T12:19:03Z

I have a book(pdf format) maybe 3 chapters, I want to use pdfminer(other tools is ok as long as the tool can do that) to parse the book, so I can extract every chapter from the book, and save them as chapter one.txt、chapter two.txt、chapter three.txt.

How can I do that?

thanks.

The text was updated successfully, but these errors were encountered:

stud2008 · 2020-01-08T08:14:39Z

I need too,Is the problem solved?

hcharp · 2022-03-18T15:37:17Z

I need it too...
For now I found how to extract titles, easy with the get_outlines() function
But I am currently thinking about how to now extract the text that is contained between two titles... Maybe by investigating in the code of that get_outlines() function?

Co-authored-by: kakann <[email protected]>

ghmo2789 added a commit to kakann/pdfminer that referenced this issue Sep 19, 2022

almost done with issue euske#238

bf04335

Co-authored-by: kakann <[email protected]>

kjuli mentioned this issue Sep 20, 2022

Issue 238 AugustBredberg/pdfminer-hamilton#5

Merged

ghmo2789 mentioned this issue Sep 22, 2022

Pdf2txt kakann/pdfminer#4

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use pdfminer to extract chapters from a book #238

use pdfminer to extract chapters from a book #238

th0o0 commented Oct 27, 2018

stud2008 commented Jan 8, 2020

hcharp commented Mar 18, 2022

use pdfminer to extract chapters from a book #238

use pdfminer to extract chapters from a book #238

Comments

th0o0 commented Oct 27, 2018

stud2008 commented Jan 8, 2020

hcharp commented Mar 18, 2022