Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting visuals and/or embedded images as images #58

Open
dylans opened this issue Mar 18, 2022 · 3 comments
Open

Extracting visuals and/or embedded images as images #58

dylans opened this issue Mar 18, 2022 · 3 comments

Comments

@dylans
Copy link

dylans commented Mar 18, 2022

Mozilla's PDF.js generates a canvas view which makes it easier to retain styles and layout. This is not really what a markdown converter should do.

That said, I've been wondering if there's a decent way to either extract embedded images as inline encoded images from markdown, or perhaps have the option to extract the content and use a headless version of the canvas render to perhaps embed images of the original pages from the PDF. Both could get included inline as base64 images.

  1. Would this be useful here or is this outside the scope of what this project wants to do?
  2. Is there a better way to achieve what I'm describing?
  3. If there's interest in what I've described, I'm happy to do the bulk of the work to make it happen, but I'd appreciate some guidance so we end up with a PR that meets the project's expectations.
@LoneRifle
Copy link
Collaborator

This would indeed be useful and within the project scope. I don't think there is a better approach, and you are free to work through this!

@galleon
Copy link

galleon commented Dec 25, 2023

Hi @dylans any update on this ?

@rightpossible
Copy link

Any updates??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants