-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WISH: Less aggressive parallelization by default (please don't use *all* CPU cores) #333
Comments
I don't have time to narrow it in to 100%, but I suspect this happens to oolong, when running |
I do agree, but in practice it's even more complicated - it won't help much
because users also need to be aware of threaded BLAS and and code from
threaded solvers (from rsparse package for example). I feel this can only
be solved if series of packages are designed carefully by a single
responsible author.
Nevertheless I will consider to change default behaviour to single thread.
…On Sun, 8 May 2022, 11:35 Henrik Bengtsson, ***@***.***> wrote:
... Also, *text2vec* might be running deep down as a dependency that
other package maintainers might not be aware of, so this behavior might be
inherited also be other packages without them knowing.
I don't have time to narrow it in to 100%, but I suspect this happens to
*oolong* <https://cran.r-project.org/web/packages/oolong/index.html>,
when running R CMD check on it. It's a package that does import any
parallel frameworks itself, but it spins off 100+ parallel workers when
being checked, including when checking it's vignette.
—
Reply to this email directly, view it on GitHub
<#333 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHC5XLZQOYJ4RGPLXOCYZTVI4ZAHANCNFSM5VKWK3FA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Thank you. Yes, is a long journey and more so since it's hard to convince R Core to provide a built-in mechanism to control and protect against this. I try to build up such a mechanism with parallelly and for those who choose to parallelize via the future ecosystem, there's also a built-in, automatic protection against recursive parallelism. There I'm hoping to attack also multi-threaded processing, which, as you mention, also comes in play. Not sure how that should be done the best way, but it's clearly a growing potential problem too. That also has the problem that it's not stable in forked parallelization, but R doesn't allow us to protect against that either. I try to raise awareness wherever I can, especially since this will be a growing problem as more and more tools support parallel processing. Luckily, from empirical admin observation on large academic HPC clusters, it looks like most software run sequentially by default. I appreciate your considerations |
Hi, I noticed text2vec runs on all CPU cores by default on Unix. This is from:
text2vec/R/zzz.R
Lines 6 to 9 in 9ddf836
text2vec/R/mc_queue.R
Lines 1 to 4 in 9ddf836
Defaulting to all cores causes major problems on machines used by multiple users, but also when there are software tools running at the same time. I spotted this on a 128 CPU core machine. Imagine running another 10-20 processes like that at the same time on this machine - it'll quickly come to a halt, which is a real problem.
Although the behavior can be changed by setting an R option, many users are not aware of the problem ... until the sysadms yell at them. Also, text2vec might be running deep down as a dependency that other package maintainers might not be aware of, so this behavior might be inherited also be other packages without them knowing.
Could you please consider switch the default to be more conservatively. Personally, I'm in the camp that everything should run sequentially (single-core), unless the user configures it otherwise. CRAN has a limit of two CPU cores.
(Disclaimer: I'm the author) If you don't want to do this, could you please consider changing from:
to
because the latter gives sysadms a chance to limit it on their end, and it also respects CGroups settings, job scheduler allocations, etc. Please see https://parallelly.futureverse.org/#availablecores-vs-paralleldetectcores for more details.
Thank you
The text was updated successfully, but these errors were encountered: