Extracting visuals and/or embedded images as images #58

dylans · 2022-03-18T12:59:01Z

Mozilla's PDF.js generates a canvas view which makes it easier to retain styles and layout. This is not really what a markdown converter should do.

That said, I've been wondering if there's a decent way to either extract embedded images as inline encoded images from markdown, or perhaps have the option to extract the content and use a headless version of the canvas render to perhaps embed images of the original pages from the PDF. Both could get included inline as base64 images.

Would this be useful here or is this outside the scope of what this project wants to do?
Is there a better way to achieve what I'm describing?
If there's interest in what I've described, I'm happy to do the bulk of the work to make it happen, but I'd appreciate some guidance so we end up with a PR that meets the project's expectations.

LoneRifle · 2022-03-18T23:43:39Z

This would indeed be useful and within the project scope. I don't think there is a better approach, and you are free to work through this!

galleon · 2023-12-25T16:49:48Z

Hi @dylans any update on this ?

rightpossible · 2024-10-02T17:19:42Z

Any updates??

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extracting visuals and/or embedded images as images #58

Extracting visuals and/or embedded images as images #58

dylans commented Mar 18, 2022

LoneRifle commented Mar 18, 2022

galleon commented Dec 25, 2023

rightpossible commented Oct 2, 2024

Extracting visuals and/or embedded images as images #58

Extracting visuals and/or embedded images as images #58

Comments

dylans commented Mar 18, 2022

LoneRifle commented Mar 18, 2022

galleon commented Dec 25, 2023

rightpossible commented Oct 2, 2024