Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

edit metadata (author, title, bookmarks, ...) #39

Open
darkdragon-001 opened this issue Jul 4, 2018 · 4 comments
Open

edit metadata (author, title, bookmarks, ...) #39

darkdragon-001 opened this issue Jul 4, 2018 · 4 comments
Labels

Comments

@darkdragon-001
Copy link

Please add a replacement for pdftk's dump_data and upate_info to edit bookmarks.

@darkdragon-001 darkdragon-001 changed the title edit metadata like bookmarks edit metadata (author, title, bookmarks, ...) Jul 12, 2018
@hellerbarde
Copy link
Owner

I'm not entirely sure if the underlying library supports it. But if it does, this looks like a good first issue for someone.

@m040601
Copy link

m040601 commented Aug 16, 2020

Any news about this ? I'm not refering to the bookmarks part of the question.

I'm wondering about the writing metadata functionality.
Staple already supports reading metadata tags:

stapler info print.pdf

*** Metadata for print.pdf
    /Title:  my Title
    /Author:  my Name
    /Keywords:  keyword1;keyword2
    /Subject:  my Subject

The same thing that you can do with:

$ pdfinfo print.pdf

Title:          my Title
Subject:        my Subject
Keywords:       keyword1;keyword2
Author:         my Name

But would writing be possible ? Like for example with exiftool

exiftool -Title="my Title" -Author="my Name" -Subject="my Subject" -Keywords="keyword1;keyword2" mypdffile.pdf

I'm not entirely sure if the underlying library supports it.

The "underlying library" ? Do you mean these two, python-pypdf2 and python-more-itertools ?

@darkdragon-001
Copy link
Author

darkdragon-001 commented Aug 16, 2020

I was asking myself, if this is really something one wants to do in the console or is it more likely that one interactively wants to edit single documents. In the latter case, I recommend https://github.com/pdfarranger/pdfarranger (at least for meta data tags).

Having read support already built in, I think write-support with a syntax similar to exiftool would be good (stapler info -title="My Title" print.pdf). Reading single meta tag for further processing could also be implemented (stapler info -title print.pdf should output only My Title).

For bookmarks, there might be the possibility that one can download/extract the table of contents with Title>PageNum. This could be used to add PDF bookmarks correctly with scripts.

@m040601
Copy link

m040601 commented Aug 16, 2020

if this is really something one wants to do in the console

yes, it is, for me at least. Thanks for the tip about pdfarranger. But I already knew about it. As well as every single GUI pdf app on the planet to edit pdf metadata.

Being a CLI app, it is the only reason for me to use stapler. And originally because I didnt want the java dependency of pdftk. And wanted a light, well maintained replacement with the same functionality.

I insist on the "metada" editing functionality. And not on the bookmark management part of the question (not asked by me).

I do this because I think it would be very interesting and usefull for a tool with the goals of stapler. Perhaps if I make it clear why support for editing metada is important for me in a pdf CLI tool , and what I want it for explains it better.

It's mainly about managing large pdf collections.

This need for managing well the metadata of pdfs, arouse in the last years with the need to manage my collection of pdfs. Again, I already know all the GUI apps to manage a pdf collection. That's not what I want.

With the increasing use of ebook readers (Kobo, Kindle) etc, and the huge increase in my pdf collection of many different types and sources, like real books or simple documents, I am faced with the same problem I already solved for my music collection.

And just like with my music collection, I dont expect the system to be perfect. I dont want to obssess about impossible perfect classification and file organization.

I also found that renaming your pdf files or trying to neatly organizing them in folders would never be a sufficient method of organizing.

But I dont want to be tied and forced to a GUI app, or proprietary tool, or a database.
I want to use my file system and the tools and scripts I am confortable with for shuffling, copying, moving what I want. Without being tied to a central database.

Just like with music metadata tags (mp3 id3v2 , apple mp4/m4a/aac tags) the standards were never perfect and well documented. Read more about "pdf tags" here:
-https://en.wikipedia.org/wiki/PDF#Metadata
-https://exiftool.org/TagNames/PDF.html
-https://www.linuxuprising.com/2019/07/how-to-edit-pdf-metadata-tags-on-linux.html

And just like with mp3/mp4 files you can never expect to get something consistent, from wherever you downloaded your pdfs or music file. Some put "Author" or "Title" or tag XYZ. Some dont put nothing. Some use XML style tags. Some use DC style tags. Some use pdf version 1.4 some version 1.6 etc.

So it's up to me to do the cleaning and organizing. As said above, I dont want to obssess about this organizing, waste to much time or use complicated tools.

I dont want much. I am satisfied if I can at least make sure all the pdf files in my collection have those simple well supported tags "Title", "Author" "Subject" and "Keywords". Having that part solved, and if you are command line user, I dont need to explain what you can achieve further with simple shell scripts/pipes/batch processing,
A simple Unix style solution for searching organizing your digital objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants