-
Hi, I see enormous RAM consumption by getPixmap() method. Could you please help me figure this out? The getPixmap() method is consuming up to about 4300MB of RAM for a 1 page document. When I use identity matrix (not passing any arguments), the consumption is reasonable, however when I add zoom_x and zoom_y, RAM usage spikes. I don't think its a memory cleanup issue, as this usage is during the execution of getPixmap(). After the execution is done, RAM usage goes down. I traced execution with debugger and it lead me to the C level, so whatever is happening seems to be there. I'm attaching a test script and the problematic pdf, which can be used for reproduction of the issue. My configuration3.8.2 (v3.8.2:7b3ab5921f, Feb 24 2020, 17:52:18) PyMuPDF 1.18.4: Python bindings for the MuPDF 1.18.0 library. Thanks, |
Beta Was this translation helpful? Give feedback.
Replies: 9 comments
-
Well ...
This comes out on Windows and Linux correctly, cannot reproduce 4300 MB. So, where is the bug? >>> import fitz
>>> doc=fitz.open("exhausting_pdf.pdf")
>>> page=doc[0]
>>> page.rect
Rect(0.0, 0.0, 3420.652099609375, 1890.31201171875)
>>> mat=fitz.Matrix(2,2)
>>> pix=page.getPixmap(matrix=mat)
>>> pix.size
77608894
>>> pix.size/1024/1024
74.01360893249512
>>> I do not not know what you need to do with this pixmap monster. >>> clip = page.rect / 3
>>> clip
Rect(0.0, 0.0, 1140.2173665364583, 630.10400390625)
>>> pix1 = page.getPixmap(matrix=mat, clip=clip)
>>> pix1.size
8629111
>>> After being done with this clipped pixmap, add appropriate values to clip to shift to the next part of the page ... |
Beta Was this translation helpful? Give feedback.
-
Hi @JorjMcKie , Thanks for the quick response. The 4300MB of RAM usage I'm seeing when I monitor with htop, while the getPixmap() is being executed. This RAM usage causes the cloud service I use to run out of quota and getPixmap() terminates on cloud. In the end I need the page as a whole (one big pixmap), but yes, I tried clipping and the RAM usage was fine. Thanks again for your help. |
Beta Was this translation helpful? Give feedback.
-
Talking of memory consumption during building the pixmap: |
Beta Was this translation helpful? Give feedback.
-
yes same thing. |
Beta Was this translation helpful? Give feedback.
-
What do you need to do with the total image? |
Beta Was this translation helpful? Give feedback.
-
Yes, I want to have the image of the whole page after the processing is done. Thanks a lot! |
Beta Was this translation helpful? Give feedback.
-
Ah, ok. I'll look into PIL / Pillow. There was something about joining image pieces ... |
Beta Was this translation helpful? Give feedback.
-
In principle this works like so: from PIL import Image
img = Image.new("RGB", (width, height)) # the resulting big image
# then for each clip, create a PIL Image:
clip_img = Image.frombytes("RGB", (clip.width, clip.height), clip.samples)
# and paste it to the right region of the final image
# 'region' is a rectangle of same width / height as clip_img
img.paste(clip_img, region)
# then save the result to e.g. a JPEG
img.save("xxx.jpg", ...) I haven't tested the memory requirements of this approach, but would expect that you are safe. |
Beta Was this translation helpful? Give feedback.
-
oh, that looks pretty much what I need. Will try it now. |
Beta Was this translation helpful? Give feedback.
oh, that looks pretty much what I need. Will try it now.
That's really helpful thanks.