Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

magick-based function hangs when used in mclapply #407

Open
zeileis opened this issue Dec 11, 2024 · 2 comments
Open

magick-based function hangs when used in mclapply #407

zeileis opened this issue Dec 11, 2024 · 2 comments

Comments

@zeileis
Copy link

zeileis commented Dec 11, 2024

Summary

I have a magick-based function that essentially converts PDF to PNG files. I'm using that function via parallel::mclapply which, in principle, works fine. However, if I call it first outside of mclapply then the subsequent application in mclapply hangs and does not proceed.

Minimal reproducible example

First, I generate four PDF files for illustration:

fil <- paste0(LETTERS[1:4], ".pdf")
for(i in fil) { pdf(i); plot(1:10); dev.off() }

The magick-based function to_png can be used to convert a PDF file to a PNG file:

to_png <- function(x) {
  magick::image_read(x) |>
    magick::image_convert("png") |>
    magick::image_write(path = gsub("pdf", "png", x), format = "png")
}

Using that function sequentially works fine, as expected:

for(i in fil) to_png(i)

However, if I do the analogous operation on four cores, the to_png() function hands and does not proceed:

library("parallel")
mclapply(fil, to_png, mc.cores = 4)

Curiously, several slight adaptations work:

  • In a fresh session when I run the mclapply first (before the sequential for() version), then it works.
  • With mc.cores = 1 it works.

Is there anything I can - or should - do to avoid this problem?

This is in Debian GNU/Linux with R 4.4.1 and magick 2.8.5.

@jeroen
Copy link
Member

jeroen commented Dec 12, 2024

Imagemagick (like most c libraries) is not intended to get forked while in use the way mcparallel does, this will probably corrupt the state of the main process. You can use imagemagick's built-in threading:

magick:::magick_threads(4)

That might also speed things up a bit.

Other than that I can only recommend to avoid using imagemagick both in the parent process and children at the same time.

@zeileis
Copy link
Author

zeileis commented Dec 12, 2024

OK, thanks!

So it's ok to load the library within the parallelization but not before? And what would be the clean way to unload it, if necessary, prior to the parallelization?

Thanks for the hint regarding the built-in threading. But if I understand correctly, then this does not help with many small "embarrassingly parallel" tasks, right? But it might help if I merge my many PDF files before converting them to PNGs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants