Is there a way to get the document's ToC or Outline? #456

inf3rnus · 2021-06-30T22:57:24Z

inf3rnus
Jun 30, 2021

Hey all, sweet library!

Is it possible to get the table of contents for a pdf directly via pdfplumber?

I know it's possible via pdfminer.six via:

from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument

# Open a PDF document.
fp = open('mypdf.pdf', 'rb')
parser = PDFParser(fp)
document = PDFDocument(parser, password)
# Get the outlines of the document.
outlines = document.get_outlines()

I don't see anything in the API showing this as being possible atm unless I'm mistaken.

I'm hoping to use pdfplumber as an all in one solution and ToC's are very important for what I'm working on. I can use pdfminer.six on the side for the time being

If it's not possible at the moment - could I make a feature request?

Thanks for all your hard work :)!

Best,
Aaron

Answered by jsvine

Jul 1, 2021

Hi Aaron, and thanks for the thoughtful question. In pdfplumber, you still get (undocumented*) access to the pdfminer.PDFDocument object, via .doc. For instance:

import pdfplumber
with pdfplumber.open("path/to/my.pdf") as pdf:
    print(list(pdf.doc.get_outlines()))

Does that work for you?

If some of the return values are <PDFObjRef:___> objects, you can resolve those via pdfplumber.utils.resolve(...) / pdfplumber.utils.resolve_all(...).

* I think it probably is worth adding more explicit documentation about this.

View full answer

jsvine · 2021-07-01T02:24:12Z

jsvine
Jul 1, 2021
Maintainer

Hi Aaron, and thanks for the thoughtful question. In pdfplumber, you still get (undocumented*) access to the pdfminer.PDFDocument object, via .doc. For instance:

import pdfplumber
with pdfplumber.open("path/to/my.pdf") as pdf:
    print(list(pdf.doc.get_outlines()))

Does that work for you?

If some of the return values are <PDFObjRef:___> objects, you can resolve those via pdfplumber.utils.resolve(...) / pdfplumber.utils.resolve_all(...).

* I think it probably is worth adding more explicit documentation about this.

1 reply

inf3rnus Jul 1, 2021
Author

That's freakin' sweet!

Thanks for the info, that covers exactly what I need, and more, now I know there's also a utils module!

Best,
Aaron

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to get the document's ToC or Outline? #456

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Is there a way to get the document's ToC or Outline? #456

inf3rnus Jun 30, 2021

Replies: 1 comment · 1 reply

jsvine Jul 1, 2021 Maintainer

inf3rnus Jul 1, 2021 Author

inf3rnus
Jun 30, 2021

Replies: 1 comment 1 reply

jsvine
Jul 1, 2021
Maintainer

inf3rnus Jul 1, 2021
Author