getPixmap() consumes enormous RAM #774

tigrankh · 2020-12-17T14:51:04Z

tigrankh
Dec 17, 2020

Hi,

I see enormous RAM consumption by getPixmap() method. Could you please help me figure this out?

The getPixmap() method is consuming up to about 4300MB of RAM for a 1 page document.

When I use identity matrix (not passing any arguments), the consumption is reasonable, however when I add zoom_x and zoom_y, RAM usage spikes.

I don't think its a memory cleanup issue, as this usage is during the execution of getPixmap(). After the execution is done, RAM usage goes down.

I traced execution with debugger and it lead me to the C level, so whatever is happening seems to be there.

I'm attaching a test script and the problematic pdf, which can be used for reproduction of the issue.
ram_usage_testcase.tar.gz

My configuration

3.8.2 (v3.8.2:7b3ab5921f, Feb 24 2020, 17:52:18)
[Clang 6.0 (clang-600.0.57)]
darwin

PyMuPDF 1.18.4: Python bindings for the MuPDF 1.18.0 library.
Version date: 2020-11-19 08:56:23.
Built for Python 3.8 on darwin (64-bit).

Thanks,
Tigran

Answered by tigrankh

Dec 17, 2020

oh, that looks pretty much what I need. Will try it now.
That's really helpful thanks.

View full answer

JorjMcKie · 2020-12-17T15:59:59Z

JorjMcKie
Dec 17, 2020
Maintainer

Well ...

this is a giant page size 3421 x 1890
then you use RGB colorspace, which requires 3 bytes per pixel
then you zoom with a factor of 2 in both dimensions (resulting in a 4 times larger pixmap)
and consequently land at 3420 * 1890 * 3 * 2 * 2 bytes pixmap size - i.e. round about 74 MB!

This comes out on Windows and Linux correctly, cannot reproduce 4300 MB.

So, where is the bug?

>>> import fitz
>>> doc=fitz.open("exhausting_pdf.pdf")
>>> page=doc[0]
>>> page.rect
Rect(0.0, 0.0, 3420.652099609375, 1890.31201171875)
>>> mat=fitz.Matrix(2,2)
>>> pix=page.getPixmap(matrix=mat)
>>> pix.size
77608894
>>> pix.size/1024/1024
74.01360893249512
>>>

I do not not know what you need to do with this pixmap monster.
But I do recommend taking pixmaps for parts of the page at a time only: use the method's clip parameter for this.
E.g. divide the page in 3 x 3 equal parts:

>>> clip = page.rect / 3
>>> clip
Rect(0.0, 0.0, 1140.2173665364583, 630.10400390625)
>>> pix1 = page.getPixmap(matrix=mat, clip=clip)
>>> pix1.size
8629111
>>>

After being done with this clipped pixmap, add appropriate values to clip to shift to the next part of the page ...

0 replies

tigrankh · 2020-12-17T16:12:54Z

tigrankh
Dec 17, 2020
Author

Hi @JorjMcKie ,

Thanks for the quick response.
The size of the pixmap is not very big for me either and the output is correct too.

The 4300MB of RAM usage I'm seeing when I monitor with htop, while the getPixmap() is being executed.
Could you check RAM usage at your end too?

This RAM usage causes the cloud service I use to run out of quota and getPixmap() terminates on cloud.

In the end I need the page as a whole (one big pixmap), but yes, I tried clipping and the RAM usage was fine.
The only thing is that I'm not sure if there is a way to combine all the partial pixmaps into 1 big.

Thanks again for your help.
Tigran

0 replies

JorjMcKie · 2020-12-17T16:16:35Z

JorjMcKie
Dec 17, 2020
Maintainer

Talking of memory consumption during building the pixmap:
This is something we must concede to the process internals:
If you specify this zoom factor, every single page coordinate at pixel (x, y) is split up into 2 x 2 = 4 separate pixels in the pixmap. A number of calculations has to take place for this obviously. Plus some smoothing has to take place also, to keep color changes to neighbouring pixels under control as well.
You should be aware that working at or this close to the borders of what the machinery of your hardware / software configuration can muster, may lead to extreme behaviour - in terms of time and / or memory consumption ...
So there is no alternative to doing your best to keep the load within reasonable limits. One way is the suggested sub-division in clips.

0 replies

JorjMcKie · 2020-12-17T16:22:57Z

JorjMcKie
Dec 17, 2020
Maintainer

Could you check RAM usage at your end too?

yes same thing.

0 replies

JorjMcKie · 2020-12-17T16:25:54Z

JorjMcKie
Dec 17, 2020
Maintainer

What do you need to do with the total image?
You could produce 3 x 3 = 9 clipped pixmaps and work your way through them, for example.
If you need to create a fine-grained image of the whole page, I would use PIL to combine the parts, etc.

0 replies

tigrankh · 2020-12-17T16:30:03Z

tigrankh
Dec 17, 2020
Author

Yes, I want to have the image of the whole page after the processing is done.
The part I don't have much info about is how to combine the separate pixmaps of the page. but looks like you gave pointers where I can look to achieve that.

Thanks a lot!

0 replies

JorjMcKie · 2020-12-17T16:32:05Z

JorjMcKie
Dec 17, 2020
Maintainer

Ah, ok. I'll look into PIL / Pillow. There was something about joining image pieces ...

0 replies

JorjMcKie · 2020-12-17T17:01:08Z

JorjMcKie
Dec 17, 2020
Maintainer

In principle this works like so:

from PIL import Image
img = Image.new("RGB", (width, height))  # the resulting big image
# then for each clip, create a PIL Image:
clip_img = Image.frombytes("RGB", (clip.width, clip.height), clip.samples)
# and paste it to the right region of the final image
# 'region' is a rectangle of same width / height as clip_img
img.paste(clip_img, region)
# then save the result to e.g. a JPEG
img.save("xxx.jpg", ...)

I haven't tested the memory requirements of this approach, but would expect that you are safe.

0 replies

tigrankh · 2020-12-17T17:15:44Z

tigrankh
Dec 17, 2020
Author

oh, that looks pretty much what I need. Will try it now.
That's really helpful thanks.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getPixmap() consumes enormous RAM #774

{{title}}

Replies: 9 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

getPixmap() consumes enormous RAM #774

tigrankh Dec 17, 2020

My configuration

Replies: 9 comments

JorjMcKie Dec 17, 2020 Maintainer

tigrankh Dec 17, 2020 Author

JorjMcKie Dec 17, 2020 Maintainer

JorjMcKie Dec 17, 2020 Maintainer

JorjMcKie Dec 17, 2020 Maintainer

tigrankh Dec 17, 2020 Author

JorjMcKie Dec 17, 2020 Maintainer

JorjMcKie Dec 17, 2020 Maintainer

tigrankh Dec 17, 2020 Author

tigrankh
Dec 17, 2020

JorjMcKie
Dec 17, 2020
Maintainer

tigrankh
Dec 17, 2020
Author

JorjMcKie
Dec 17, 2020
Maintainer

JorjMcKie
Dec 17, 2020
Maintainer

JorjMcKie
Dec 17, 2020
Maintainer

tigrankh
Dec 17, 2020
Author

JorjMcKie
Dec 17, 2020
Maintainer

JorjMcKie
Dec 17, 2020
Maintainer

tigrankh
Dec 17, 2020
Author