-
Good morning! I am working on a script to manipulate PDF Portfolios. One issue that I'm having relates to the order that the attachments are displayed (Name, date, order.....etc). Is this "sorted by" criteria found anywhere in the PDF metadata? Ideally, I would extract the "sorted by" criteria and then use that to reorganize all of the attachments into a singular PDF file in the same order. Let me know if I need to provide any additional information! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 7 replies
-
What is a "portfolio" in PDF?
(Standard) metadata is a ultimately well-defined term: a Python dictionary with a fixed set of keys. |
Beta Was this translation helpful? Give feedback.
-
I have made a few changes to PyMuPDF's embedded files handling. >>> doc=fitz.open("portfolio-sample.pdf")
>>> n=doc.embfile_count()
>>> from pprint import pprint
>>> for i in range(n):
pprint(doc.embfile_info(i))
print("-"*10)
{'checksum': b'c59313c2a309cb9a235fe2809464c39266c3bc',
'collection': 50,
'creationDate': "D:20120928172443-04'00'",
'desc': '',
'filename': 'facility_budget.xlsx',
'length': 9633,
'modDate': "D:20120928172458-04'00'",
'name': '<0>facility_budget.xlsx',
'size': 12019,
'ufilename': 'facility_budget.xlsx'}
----------
{'checksum': b'6e0c1707112538c2aa65cb9a143f5864e280ba21',
'collection': 48,
'creationDate': "D:20120928172410-04'00'",
'desc': '',
'filename': 'globalcorp_projectupdate.pptx',
'length': 772137,
'modDate': "D:20120928172433-04'00'",
'name': '<0>globalcorp_projectupdate.pptx',
'size': 1179462,
'ufilename': 'globalcorp_projectupdate.pptx'}
----------
{'checksum': b'c384c5a1c3aee2809e74',
'collection': 198,
'creationDate': "D:20120928172335-04'00'",
'desc': '',
'filename': 'memo.docx',
'length': 223867,
'modDate': "D:20120928172359-04'00'",
'name': '<0>memo.docx',
'size': 227780,
'ufilename': 'memo.docx'}
----------
{'checksum': b'33416be2809a5249c2b738c2b0e2809e5b2c767414e28098',
'collection': 44,
'creationDate': "D:20120726135901-04'00'",
'desc': '',
'filename': 'new_secure_network.pdf',
'length': 451454,
'modDate': "D:20120726140215-04'00'",
'name': '<0>new_secure_network.pdf',
'size': 535348,
'ufilename': 'new_secure_network.pdf'}
----------
{'checksum': b'68c2a846c3ad0dc3a1c3a661c38fc2a8c5a1efac81c2b2c39d6ecb9a',
'collection': 42,
'creationDate': "D:20120928172712-04'00'",
'desc': '',
'filename': 'rendering.jpg',
'length': 601541,
'modDate': "D:20120928172824-04'00'",
'name': '<0>rendering.jpg',
'size': 614801,
'ufilename': 'rendering.jpg'}
----------
>>> # look at 'collection' of last file:
>>> print(doc.xref_object(42))
<<
/adobe:DisplayName (Office Rendering)
/adobe:Order 5
/adobe:Summary 43 0 R
>>
>>> # let us see what is in 'adobe:Summary':
>>> print(doc.xref_object(43))
<<
/D (Architects vision of one of the social areas of the new location.)
/RichText (<TextFlow whiteSpaceCollapse="preserve" xmlns="http://ns.adobe.com/textLayout/2008"><p><span>Architects vision of one of the social areas of the new location.</span></p></TextFlow>)
>>
>>> |
Beta Was this translation helpful? Give feedback.
-
There is a pre-version 1.18.13 in the PyMuPDF-Wheels repo. Look in either the Linux or the OSX branch for your config.
This version has the mehods I was using for that little demo.
Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
From: ***@***.***>
Sent: Montag, 26. April 2021 09:59
To: ***@***.***>
Cc: Jorj X. ***@***.***>; ***@***.***>
Subject: Re: [pymupdf/PyMuPDF] PDF Portfolio attachments display order (#1030)
Hi Jorj,
Thank you for the in-depth response. That looks very promising! The particular line that caught my eye was
/adobe:Order 5
I believe this is would help me out greatly in this project. If I can get the "order" field, then I would just need to identify the sorting method used in the document. Different files are sorted by different criteria, so I need to go through each file, identify the sorting method, and then save the attachments in that same order. I'm hoping to dig through the metadata and eventually find a field that displays the "sort by" criteria.
Below is just a quick example of what I'm talking about with the "sort by" field. Thanks!
[image]<https://user-images.githubusercontent.com/82478006/116094490-31ad0200-a66d-11eb-9d85-8846037714b6.png>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#1030 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AB7IDIRODK2IXD3CUTETS73TKVWTFANCNFSM43OWMBMA>.
|
Beta Was this translation helpful? Give feedback.
I have made a few changes to PyMuPDF's embedded files handling.
Note that the portfolio information for each embedded file is contained in object encoded in xref "collection".
If you look at this example file, the following things will be possible in v1.18.13: