Manage pdf metadata? #225

Toneti777 · 2013-10-17T11:09:38Z

I'm thinking about try to read and convert some metadata information of pdf file to html accessible "elements".
This metadata would be external links or why not.. embedded videos.
I don't know if you have thought about this or it is planned for future. I think that it would be a very good improvement.

Thanks a lot, and good job.

coolwanglu · 2013-10-17T11:40:37Z

In very early version of pdf2htmlEX, some metadata (e.g. title, author) are retrieved, I wanted to extract title and put it as the title of HTML. However I found that that piece of info of lots of PDF generated from LaTeX were inaccurate, due to the intermediate conversions (tex -> dvi -> ps -> pdf). So I dropped those code.

It's not hard to extract those information, but I'm not sure what's the proper way of storing them into HTML. I'm almost sure that I should use <meta> tag, but not sure about the property names.

mrshu · 2013-12-31T11:32:11Z

@coolwanglu MDN has quite a nice list of property names that would be appropriate in this situation.

If you point me out to the code that extracts the metadata I would like to work on putting them into HTML.

coolwanglu · 2014-01-02T12:05:02Z

@mrshu It's something like this: https://github.com/coolwanglu/pdf2htmlEX/blob/paperclub/src/PaperClub.h#L146 but maybe you need to read poppler code and PDF spec for complete keys.

peterwalll · 2016-02-25T04:31:17Z

Hi, Toneti.
Thanks for sharing your problem. I wonder have you found your way out? When it comes to pdf text extraction processing, I wonder whether text extraction from pdf files is much simpler than pdf to text conversion process. There's something wrong with my pdf viewer. I want to look for a method to help with the relevant process. Any suggestion will be appreciated. Thanks in advance.

Best regards,
Pan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manage pdf metadata? #225

Manage pdf metadata? #225

Toneti777 commented Oct 17, 2013

coolwanglu commented Oct 17, 2013

mrshu commented Dec 31, 2013

coolwanglu commented Jan 2, 2014

peterwalll commented Feb 25, 2016

Manage pdf metadata? #225

Manage pdf metadata? #225

Comments

Toneti777 commented Oct 17, 2013

coolwanglu commented Oct 17, 2013

mrshu commented Dec 31, 2013

coolwanglu commented Jan 2, 2014

peterwalll commented Feb 25, 2016