Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fitAR_map needs parallel version #13

Open
morrowcj opened this issue Dec 16, 2022 · 4 comments
Open

fitAR_map needs parallel version #13

morrowcj opened this issue Dec 16, 2022 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@morrowcj
Copy link
Owner

Problem

With large datasets, fitAR_map() (and perhaps fitCLS_map()) is exceptionally slow - it can take longer than the partitioned GLS, since it only runs in serial (1 thread). This is clearly an oversight.

Solution

This functionality lends itself to parallelization well. AR models are fit to each pixel independently. So, it makes sense to implement a parallel version of fitAR_map() via an argument ncores.

Early on in the development of remotePARTS, I gave examples about how to apply fitAR() to a full map with the raster package. But, since this package doesn't officially support rasters, an implementation that can handle matrices (and dataframes) is important.

@morrowcj morrowcj added the enhancement New feature or request label Dec 16, 2022
@morrowcj morrowcj self-assigned this Dec 16, 2022
@morrowcj
Copy link
Owner Author

morrowcj commented Dec 16, 2022

I've been successful with a script containing the following code that seems to work well:

iterator-function.R:

iblkrow <- function(a, chunks) {
  n <- nrow(a)
  i <- 1
  
  nextElem <- function() {
    if (chunks <= 0 || n <= 0) stop('StopIteration')
    m <- ceiling(n / chunks)
    r <- seq(i, length=m)
    i <<- i + m
    n <<- n - m
    chunks <<- chunks - 1
    a[r,, drop=FALSE]
  }  
  structure(list(nextElem=nextElem), class=c('iblkrow', 'iter'))
}
nextElem.iblkrow <- function(obj) obj$nextElem()

parallel-AR.R:

# ... setup
  library(doParallel)
  library(foreach)
  registerDoParallel(core_num)
  source("iterator-function.R")  # loads helper functions including the iterator iblkrow()
  
  # loop through blocks of the data (using helper function)
  AR_results = foreach(y = iblkrow(data_matrix, core_num),
                       crds = iblkrow(coord_matrix, core_num),
                       .packages = c("remotePARTS", "foreach"), .inorder = TRUE, .combine = rbind) %dopar% {
                         # loop through each row of the chunks
                         out = foreach(i=seq_len(nrow(y)), .combine = rbind, .inorder = TRUE) %do% {
                           t = seq_len(ncol(y))  # time
                           AR = fitAR(as.formula(as.vector(y[i, ]) ~ t))  # AR fit
                           # return a collection of AR variables of interest
                           c(coef = unname(AR$coefficients["t"]),
                             pval = unname(AR$pval["t"]),
                             resids = AR$residuals)
                         }
                         rownames(out) = NULL # remove rownames "t"
                         out # return the combined results
                       }

@PlekhanovaElena
Copy link

@morrowcj , thank you very much for ,making the parallel version - this is very helpful!
Is there something missing in the second row of the loop?
crds = iblkrow(coord_matrix, ...?

@morrowcj
Copy link
Owner Author

morrowcj commented Jan 17, 2023

You are correct, @PlekhanovaElena. I somehow lost an argument and the closing parenthesis. That line should read crds = iblkrow(coord_matrix, core_num),. I've edited my comment above to reflect this.

And just in case it wasn't clear, core_num is the total number of cores you'd like to use for the operation, data_matrix is a matrix containing the spatial time series data (rows are locations, columns are time steps) and coord_matrix is the coordinate matrix (rows are locations and columns are longitude and latitude, respectively.

@morrowcj
Copy link
Owner Author

The above code (or something similar) should eventually be implemented in the package. Hopefully I'll have time to get to this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants