How to reduce the file size of the extracted html? #1554
Unanswered
DipanshuJuneja
asked this question in
Q&A
Replies: 1 comment 1 reply
-
This is a thin wrapper of an original MuPDF function. So there is no way for me to influence the output, sorry. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The quality of the extracted html output for
PyMuPDF
is far better than what I was getting using some of the other libraries likePDBox
wrapper for python. However, one concern I have is regarding the output file size which is quite larger (1.5 MB) as compared to the other option (400 KB). I am using the flag to skip images usingnot fitz.TEXT_PRESERVE_IMAGES
. Apart from this, how can I further reduce the size of the output html file? I'm looking for minified versions of the html code. Thanks. I want to preserve the whitespaces if possibly since the PDF contains a few tables as well.Beta Was this translation helpful? Give feedback.
All reactions