-
Hey all, sweet library! Is it possible to get the table of contents for a pdf directly via pdfplumber? I know it's possible via pdfminer.six via:
I don't see anything in the API showing this as being possible atm unless I'm mistaken. I'm hoping to use pdfplumber as an all in one solution and ToC's are very important for what I'm working on. I can use pdfminer.six on the side for the time being If it's not possible at the moment - could I make a feature request? Thanks for all your hard work :)! Best, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi Aaron, and thanks for the thoughtful question. In import pdfplumber
with pdfplumber.open("path/to/my.pdf") as pdf:
print(list(pdf.doc.get_outlines())) Does that work for you? If some of the return values are * I think it probably is worth adding more explicit documentation about this. |
Beta Was this translation helpful? Give feedback.
Hi Aaron, and thanks for the thoughtful question. In
pdfplumber
, you still get (undocumented*) access to thepdfminer.PDFDocument
object, via.doc
. For instance:Does that work for you?
If some of the return values are
<PDFObjRef:___>
objects, you can resolve those viapdfplumber.utils.resolve(...)
/pdfplumber.utils.resolve_all(...)
.* I think it probably is worth adding more explicit documentation about this.