Skip to content
This repository has been archived by the owner on Dec 9, 2018. It is now read-only.

Manage pdf metadata? #225

Open
Toneti777 opened this issue Oct 17, 2013 · 4 comments
Open

Manage pdf metadata? #225

Toneti777 opened this issue Oct 17, 2013 · 4 comments

Comments

@Toneti777
Copy link

I'm thinking about try to read and convert some metadata information of pdf file to html accessible "elements".
This metadata would be external links or why not.. embedded videos.
I don't know if you have thought about this or it is planned for future. I think that it would be a very good improvement.

Thanks a lot, and good job.

@coolwanglu
Copy link
Owner

In very early version of pdf2htmlEX, some metadata (e.g. title, author) are retrieved, I wanted to extract title and put it as the title of HTML. However I found that that piece of info of lots of PDF generated from LaTeX were inaccurate, due to the intermediate conversions (tex -> dvi -> ps -> pdf). So I dropped those code.

It's not hard to extract those information, but I'm not sure what's the proper way of storing them into HTML. I'm almost sure that I should use <meta> tag, but not sure about the property names.

@mrshu
Copy link

mrshu commented Dec 31, 2013

@coolwanglu MDN has quite a nice list of property names that would be appropriate in this situation.

If you point me out to the code that extracts the metadata I would like to work on putting them into HTML.

@coolwanglu
Copy link
Owner

@mrshu It's something like this: https://github.com/coolwanglu/pdf2htmlEX/blob/paperclub/src/PaperClub.h#L146 but maybe you need to read poppler code and PDF spec for complete keys.

@peterwalll
Copy link

Hi, Toneti.
Thanks for sharing your problem. I wonder have you found your way out? When it comes to pdf text extraction processing, I wonder whether text extraction from pdf files is much simpler than pdf to text conversion process. There's something wrong with my pdf viewer. I want to look for a method to help with the relevant process. Any suggestion will be appreciated. Thanks in advance.

Best regards,
Pan

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants