Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelization problems on Ubuntu #106

Open
c-mertes opened this issue Dec 2, 2019 · 5 comments
Open

Parallelization problems on Ubuntu #106

c-mertes opened this issue Dec 2, 2019 · 5 comments

Comments

@c-mertes
Copy link

c-mertes commented Dec 2, 2019

I'm not sure where the problem is and maybe you can help finding the root of it.

When using OUTRIDER on a CentOS 7.7 machine MulticoreParam works perfectly. But on Ubuntu it stucks after the first bplapply call and does not return anything. When using SerialParam it goes through normally on both CentOS and Ubuntu. It does not matter if we use Multicore or Snow. Coult the the multithreaded BLAS and LAPACK versions from Ubuntu cause the problem?

For more details have a look here gagneurlab/OUTRIDER#18

This is how to reproduce the problem on my laptop:

if (!requireNamespace("OUTRIDER", quietly=TRUE))
    BiocManager::install("OUTRIDER")

library(OUTRIDER)
download.file("https://github.com/gagneurlab/OUTRIDER/files/3898551/all_fib_cts.gz", "cts.gz")
ods <- OutriderDataSet(countData=read.table("cts.gz"))
ods <-  filterExpression(ods, minCounts=TRUE)

register(MulticoreParam(2, 20, progressbar=TRUE))
ods <- OUTRIDER(ods, verbose=TRUE)

Thanks for any help.

@mtmorgan
Copy link
Collaborator

I'm not able to reproduce this. To troubleshoot I'd aim for a simpler example, e.g.,

bplapply(1:5, identity, BPPARAM = MulticoreParam()

with a likely culprit being blocked ports (BiocParallel spawns workers who communicate with the master through sockets; see manager.port on ?MulticoreParam

@lshep
Copy link

lshep commented Jan 22, 2020

@c-mertes - If we cannot reproduce this and you can not provide more details we will close the issue - Were you still encountering this and is it possible to provide a simpler example as requested?

@HenrikBengtsson
Copy link
Contributor

HenrikBengtsson commented Jan 22, 2020

I can reproduce this on "stock" R 3.6.2 on Ubuntu 18.04 with both MulticoreParam() and SnowParam();

> ods <- OUTRIDER(ods, verbose=TRUE)
Wed Jan 22 06:55:34 2020: SizeFactor estimation ...
Wed Jan 22 06:55:35 2020: Controlling for confounders ...
Using estimated q with: 45
Wed Jan 22 06:55:35 2020: Using the autoencoder implementation for controlling.
  |                                                                      |   0%

If you look at top/htop, you'll see that both of the two forked child processes are indeed running but they're running at 100% on each of your cores (in my case I've got 8 cores so at 800%). ELI5: The forked multicore workers are running beyond wild trying to get timeslots on the CPU, which just can't keep up and you end up clogging up the OS trying switch between way too many threads. I'm pretty sure this is due to multi-threading, which typically is due to OpenMP multi-threading is used by some native code - something that becomes more and more common these days in R as it is easier and easier for developers to implement this via the Rcpp ecosystem.

Sure enough, if we force single-threaded OpenMP(*):

RhpcBLASctl::omp_set_num_threads(1L)
register(MulticoreParam(workers=2L, tasks=20L, progressbar=TRUE))
ods <- OUTRIDER(ods, verbose=TRUE)
  |=================================================                     |  70%

It might also work with, say, RhpcBLASctl::omp_set_num_threads(2L).

(*) You might have to restart R first.

The above approach to force single-threaded OpenMP will only work with MulticoreParam - for SnowParam the RhpcBLASctl::omp_set_num_threads(1L) call has to be called within every worker. Not sure how to do that in BiocParallel.

Either way, the above is a problem that we will see popping in more and more code. It will appear "randomly" as more and more packages start parallelizing. The main problem is that developers think they have full access to all cores on the machine, which often stems from using parallel::detectCores() [<<== BAD] or similar (here it's something similar in OpenMP) to decide on the number of workers or number of threads. It is effectively left to the end-user to troubleshoot and deal with this. What makes it worse, it's very hard for the user to disable this overuse of the CPU (recently I discovered that RhpcBLASctl::omp_set_num_threads(1L) might not do work on all platforms/R builds).

As a starter, I think OUTRIDER needs to document this and provide mechanisms/options/arguments for running in single-threaded mode.

There's probably also room for BioParallel to do something here, e.g. documentation, collect problematic examples, educate developers, don't use parallel::detectCores(), etc.

In the bigger picture, I think Bioconductor and CRAN need to work together to detect cases of this through their R CMD check:s and report back to developers. Without protection against this, this problem will become more common rather soon. I also think there should be some built-in protection against this in base R, or at least user and developer options for disabling multi-processing/multi-forking/multi-threading.

Reproducible example

One time setup:

if (!requireNamespace("OUTRIDER", quietly=TRUE))
    BiocManager::install("OUTRIDER")
if (!utils::file_test("-f", "cts.gz"))
    download.file("https://github.com/gagneurlab/OUTRIDER/files/3898551/all_fib_cts.gz", "cts.gz", mode = "wb")
library(OUTRIDER)
counts <- read.table("cts.gz")
ods <- OutriderDataSet(countData=counts)
ods <- filterExpression(ods, minCounts=TRUE)

register(MulticoreParam(workers=2L, tasks=20L, progressbar=TRUE))
#register(SnowParam(workers=2L, tasks=20L, type="SOCK", progressbar=TRUE))
#register(SerialParam(progressbar=TRUE))
ods <- OUTRIDER(ods, verbose=TRUE)

Session info

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] OUTRIDER_1.4.0              data.table_1.12.8          
 [3] SummarizedExperiment_1.16.1 DelayedArray_0.12.2        
 [5] matrixStats_0.55.0-9000     GenomicFeatures_1.38.0     
 [7] AnnotationDbi_1.48.0        Biobase_2.46.0             
 [9] GenomicRanges_1.38.0        GenomeInfoDb_1.22.0        
[11] IRanges_2.20.2              S4Vectors_0.24.3           
[13] BiocGenerics_0.32.0         BiocParallel_1.20.1        

loaded via a namespace (and not attached):
  [1] colorspace_1.4-1         htmlTable_1.13.3         XVector_0.26.0          
  [4] base64enc_0.1-3          rstudioapi_0.10          bit64_0.9-7             
  [7] codetools_0.2-16         splines_3.6.2            PRROC_1.3.1             
 [10] geneplotter_1.64.0       knitr_1.27               zeallot_0.1.0           
 [13] Formula_1.2-3            jsonlite_1.6             Rsamtools_2.2.1         
 [16] annotate_1.64.0          cluster_2.1.0            dbplyr_1.4.2            
 [19] png_0.1-7                pheatmap_1.0.12          compiler_3.6.2          
 [22] httr_1.4.1               backports_1.1.5          assertthat_0.2.1        
 [25] Matrix_1.2-18            lazyeval_0.2.2           acepack_1.4.1           
 [28] htmltools_0.4.0          prettyunits_1.1.0        tools_3.6.2             
 [31] gtable_0.3.0             glue_1.3.1               GenomeInfoDbData_1.2.2  
 [34] dplyr_0.8.3              rappdirs_0.3.1           Rcpp_1.0.3              
 [37] vctrs_0.2.1              Biostrings_2.54.0        gdata_2.18.0            
 [40] rtracklayer_1.46.0       iterators_1.0.12         xfun_0.12               
 [43] stringr_1.4.0            lifecycle_0.1.0          gtools_3.8.1            
 [46] XML_3.99-0.3             dendextend_1.13.2        MASS_7.3-51.5           
 [49] zlibbioc_1.32.0          scales_1.1.0             TSP_1.1-7               
 [52] pcaMethods_1.78.0        hms_0.5.3                RColorBrewer_1.1-2      
 [55] BBmisc_1.11              curl_4.3                 memoise_1.1.0           
 [58] heatmaply_1.0.0          gridExtra_2.3            ggplot2_3.2.1           
 [61] biomaRt_2.42.0           rpart_4.1-15             latticeExtra_0.6-29     
 [64] stringi_1.4.5            RSQLite_2.2.0            genefilter_1.68.0       
 [67] gclus_1.3.2              foreach_1.4.7            checkmate_1.9.4         
 [70] seriation_1.2-8          caTools_1.18.0           rlang_0.4.2             
 [73] pkgconfig_2.0.3          bitops_1.0-6             lattice_0.20-38         
 [76] purrr_0.3.3              GenomicAlignments_1.22.1 htmlwidgets_1.5.1       
 [79] bit_1.1-15.1             tidyselect_0.2.5         plyr_1.8.5              
 [82] magrittr_1.5             DESeq2_1.26.0            R6_2.4.1                
 [85] gplots_3.0.1.2           Hmisc_4.3-0              DBI_1.1.0               
 [88] pillar_1.4.3             foreign_0.8-75           survival_3.1-8          
 [91] RCurl_1.98-1.1           nnet_7.3-12              tibble_2.1.3            
 [94] crayon_1.3.4             KernSmooth_2.23-16       BiocFileCache_1.10.2    
 [97] plotly_4.9.1             viridis_0.5.1            jpeg_0.1-8.1            
[100] progress_1.2.2           locfit_1.5-9.1           grid_3.6.2              
[103] blob_1.2.1               digest_0.6.23            webshot_0.5.2           
[106] xtable_1.8-4             tidyr_1.0.0              openssl_1.4.1           
[109] munsell_0.5.0            registry_0.5-1           viridisLite_0.3.0       
[112] askpass_1.1

See also

@mxblsdl
Copy link

mxblsdl commented Jan 29, 2020

I am trying to diagnose a problem with running parallel on Ubuntu and I think this may be related. I have a function that uses data.table to perform a number of calculations and when run this in parallel with future.lapply and look at top I see all of the R sessions running at +500% CPU.

I know data.table runs multithread by default and I was wondering if this could be the cause of the CPU overage.
Using plan(multisession) to set the future

Session Info


R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tictoc_1.0         sf_0.8-0           future.apply_1.4.0 future_1.15.1      data.table_1.12.8  raster_3.0-7       sp_1.3-2           optparse_1.6.4    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3         compiler_3.6.2     pillar_1.4.3       class_7.3-15       tools_3.6.2        zeallot_0.1.0      digest_0.6.23      tibble_2.1.3      
 [9] lattice_0.20-38    pkgconfig_2.0.3    rlang_0.4.2        DBI_1.1.0          cli_1.1.0          rstudioapi_0.10    yaml_2.2.0         parallel_3.6.2    
[17] rgdal_1.4-8        e1071_1.7-3        vctrs_0.2.0        globals_0.12.5     classInt_0.4-1     grid_3.6.2         getopt_1.20.3      listenv_0.8.0     
[25] fansi_0.4.1        magrittr_1.5       backports_1.1.5    codetools_0.2-16   units_0.6-5        assertthat_0.2.1   KernSmooth_2.23-16 utf8_1.1.4        
[33] crayon_1.3.4   

@c-mertes
Copy link
Author

c-mertes commented Feb 11, 2020

thanks @HenrikBengtsson this was really helpful. I can confirm that the RhpcBLASctl::omp_set_num_threads(1L) did the trick for us and now it is also running through on my WSL.

Since we do use RcppArmadillo in our optimization, an alternative workaround would be to compile the package with the flags -DARMA_DONT_USE_OPENMP. This way we do not have to care anymore about Snow or Multicore and how the enduser parallelize. But on the other hand we lose the parallelization if we are in serial mode.

@lshep here is a smaller example what is going wrong. Its still uses OUTRIDER and its internal c function. But if needed I could try to write a more simpler cpp function. In the end the c functions does some matrix multiplications and some element wise operations, which are parallelized with openMP, and returns a single value.


# load BiocParallel
library(BiocParallel)

# create example data
q <- 40
n <- 200
m <- 20000

b     <- abs(rnorm(m))
D     <- matrix(rnorm((q)*m), nrow=m)
k     <- matrix(rnbinom(n*m, 10, mu=400), nrow=n)
theta <- abs(rnorm(m))
mask  <- matrix(1, nrow=m, ncol=n)
sf    <- abs(rnorm(n, mean=1))
H     <- matrix(rnorm(q*n), ncol=q)

# Serial with 1 openMP thread works
RhpcBLASctl::omp_set_num_threads(1L)
BPPARAM <- SerialParam(progr=TRUE)
bplapply(seq_along(b), BPPARAM=BPPARAM, function(i) { 
    OUTRIDER:::truncLogLiklihoodD(par = c(b[i], D[i,]), H=H, k = k[,i], sf = sf,
            exclusionMask = mask[i,], theta = theta[i], thetaC = mask)})

# Serial with 10 openMP threads works
RhpcBLASctl::omp_set_num_threads(10L)
BPPARAM <- SerialParam(progr=TRUE)
bplapply(seq_along(b), BPPARAM=BPPARAM, function(i) { 
    OUTRIDER:::truncLogLiklihoodD(par = c(b[i], D[i,]), H=H, k = k[,i], sf = sf,
            exclusionMask = mask[i,], theta = theta[i], thetaC = mask)})

# Multicore with 1 openMP thread works
RhpcBLASctl::omp_set_num_threads(1L)
BPPARAM <- MulticoreParam(4, 40, progr=TRUE)
bplapply(seq_along(b), BPPARAM=BPPARAM, function(i) { 
    OUTRIDER:::truncLogLiklihoodD(par = c(b[i], D[i,]), H=H, k = k[,i], sf = sf,
            exclusionMask = mask[i,], theta = theta[i], thetaC = mask)})

# Multicore with 10 openMP threads does not work
RhpcBLASctl::omp_set_num_threads(10L)
BPPARAM <- MulticoreParam(4, 40, progr=TRUE)
bplapply(seq_along(b), BPPARAM=BPPARAM, function(i) { 
    OUTRIDER:::truncLogLiklihoodD(par = c(b[i], D[i,]), H=H, k = k[,i], sf = sf,
            exclusionMask = mask[i,], theta = theta[i], thetaC = mask)})

And my R session is:

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8    
 [5] LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C             
 [9] LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocParallel_1.20.1

loaded via a namespace (and not attached):
  [1] colorspace_1.4-1            htmlTable_1.13.3            XVector_0.26.0             
  [4] GenomicRanges_1.38.0        base64enc_0.1-3             rstudioapi_0.10            
  [7] bit64_0.9-7                 AnnotationDbi_1.48.0        codetools_0.2-16           
 [10] splines_3.6.2               PRROC_1.3.1                 geneplotter_1.64.0         
 [13] knitr_1.27                  Formula_1.2-3               jsonlite_1.6               
 [16] Rsamtools_2.2.1             RhpcBLASctl_0.20-17         annotate_1.64.0            
 [19] cluster_2.0.9               OUTRIDER_1.4.0              dbplyr_1.4.2               
 [22] png_0.1-7                   pheatmap_1.0.12             compiler_3.6.2             
 [25] httr_1.4.1                  backports_1.1.5             assertthat_0.2.1           
 [28] Matrix_1.2-18               lazyeval_0.2.2              acepack_1.4.1              
 [31] htmltools_0.4.0             prettyunits_1.1.1           tools_3.6.2                
 [34] gtable_0.3.0                glue_1.3.1                  GenomeInfoDbData_1.2.2     
 [37] dplyr_0.8.3                 rappdirs_0.3.1              Rcpp_1.0.3                 
 [40] Biobase_2.46.0              vctrs_0.2.2                 Biostrings_2.54.0          
 [43] gdata_2.18.0                rtracklayer_1.46.0          iterators_1.0.12           
 [46] xfun_0.12                   stringr_1.4.0               lifecycle_0.1.0            
 [49] gtools_3.8.1                XML_3.99-0.3                dendextend_1.13.2          
 [52] MASS_7.3-51.4               zlibbioc_1.32.0             scales_1.1.0               
 [55] TSP_1.1-8                   pcaMethods_1.78.0           hms_0.5.3                  
 [58] parallel_3.6.2              SummarizedExperiment_1.16.1 RColorBrewer_1.1-2         
 [61] BBmisc_1.11                 curl_4.3                    memoise_1.1.0              
 [64] heatmaply_1.0.0             gridExtra_2.3               ggplot2_3.2.1              
 [67] biomaRt_2.42.0              rpart_4.1-15                latticeExtra_0.6-29        
 [70] stringi_1.4.5               RSQLite_2.2.0               genefilter_1.68.0          
 [73] gclus_1.3.2                 S4Vectors_0.24.3            foreach_1.4.7              
 [76] checkmate_1.9.4             seriation_1.2-8             GenomicFeatures_1.38.1     
 [79] caTools_1.18.0              BiocGenerics_0.32.0         GenomeInfoDb_1.22.0        
 [82] rlang_0.4.3                 pkgconfig_2.0.3             matrixStats_0.55.0         
 [85] bitops_1.0-6                lattice_0.20-38             purrr_0.3.3                
 [88] GenomicAlignments_1.22.1    htmlwidgets_1.5.1           bit_1.1-15.1               
 [91] tidyselect_1.0.0            plyr_1.8.5                  magrittr_1.5               
 [94] DESeq2_1.26.0               R6_2.4.1                    IRanges_2.20.2             
 [97] gplots_3.0.1.2              Hmisc_4.3-0                 DelayedArray_0.12.2        
[100] DBI_1.1.0                   pillar_1.4.3                foreign_0.8-71             
[103] survival_2.44-1.1           RCurl_1.98-1.1              nnet_7.3-12                
[106] tibble_2.1.3                crayon_1.3.4                KernSmooth_2.23-16         
[109] BiocFileCache_1.10.2        plotly_4.9.1                viridis_0.5.1              
[112] jpeg_0.1-8.1                progress_1.2.2              locfit_1.5-9.1             
[115] grid_3.6.2                  data.table_1.12.8           blob_1.2.1                 
[118] digest_0.6.23               webshot_0.5.2               xtable_1.8-4               
[121] tidyr_1.0.2                 openssl_1.4.1               stats4_3.6.2               
[124] munsell_0.5.0               registry_0.5-1              viridisLite_0.3.0          
[127] askpass_1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants