How to match PyMuPdf output to pdf2image output? #913
-
Looking for help on this question asked on Stackoverflow. Got a PDF to PNG converter built with pdf2image. It is quite slow for converting PDF to PNG images. For example, a 7-page PDF document takes 10 seconds to get split into PNG images even with thread_count set to 4 on a 4-core machine (Standard B4ms Azure VM). Tried PyMuPdf and it ran much faster (only 800 ms) with default scaling: Then I realized that the PNGs that are output with default scaling are much smaller compared to those of pdf2image, so increased the scaling to match the PNGs output by pdf2image. I had to use a scale factor of 2.7777 for the pixel sizes to match up with pdf2image, so This took 3 seconds to run, but still much faster compared to pdf2image. The images output by PyMuPdf looked quite identical to those of pdf2image to my eyes, but they actually differ. Our downstream processing (an object detection model) which uses these PNG's also produces different results. Looking at pdf2image doc, we have just used the default dpi of 200. How does one translate this setting to PyMuPdf to get the exact same output? I tried setResolution of 200, but that didn't help. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
But the resolution was correctly set to the value, wasn't it? So, what was the difference? |
Beta Was this translation helpful? Give feedback.
But the resolution was correctly set to the value, wasn't it? So, what was the difference?
Anyway, PyMuPDF also supports using Pillow for pixmap output, try this
pix.pillowWrite("%02i.png" % page.number, dpi=(200, 200))
.The parameters of
pillowWrite()
are passed through to Pillow'sImage.save()
method unchanged. This should enable you to make the output as equal as desired.