-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fitAR_map needs parallel version #13
Comments
I've been successful with a script containing the following code that seems to work well: iterator-function.R: iblkrow <- function(a, chunks) {
n <- nrow(a)
i <- 1
nextElem <- function() {
if (chunks <= 0 || n <= 0) stop('StopIteration')
m <- ceiling(n / chunks)
r <- seq(i, length=m)
i <<- i + m
n <<- n - m
chunks <<- chunks - 1
a[r,, drop=FALSE]
}
structure(list(nextElem=nextElem), class=c('iblkrow', 'iter'))
}
nextElem.iblkrow <- function(obj) obj$nextElem() parallel-AR.R: # ... setup
library(doParallel)
library(foreach)
registerDoParallel(core_num)
source("iterator-function.R") # loads helper functions including the iterator iblkrow()
# loop through blocks of the data (using helper function)
AR_results = foreach(y = iblkrow(data_matrix, core_num),
crds = iblkrow(coord_matrix, core_num),
.packages = c("remotePARTS", "foreach"), .inorder = TRUE, .combine = rbind) %dopar% {
# loop through each row of the chunks
out = foreach(i=seq_len(nrow(y)), .combine = rbind, .inorder = TRUE) %do% {
t = seq_len(ncol(y)) # time
AR = fitAR(as.formula(as.vector(y[i, ]) ~ t)) # AR fit
# return a collection of AR variables of interest
c(coef = unname(AR$coefficients["t"]),
pval = unname(AR$pval["t"]),
resids = AR$residuals)
}
rownames(out) = NULL # remove rownames "t"
out # return the combined results
}
|
@morrowcj , thank you very much for ,making the parallel version - this is very helpful! |
You are correct, @PlekhanovaElena. I somehow lost an argument and the closing parenthesis. That line should read And just in case it wasn't clear, |
The above code (or something similar) should eventually be implemented in the package. Hopefully I'll have time to get to this soon. |
Problem
With large datasets,
fitAR_map()
(and perhapsfitCLS_map()
) is exceptionally slow - it can take longer than the partitioned GLS, since it only runs in serial (1 thread). This is clearly an oversight.Solution
This functionality lends itself to parallelization well. AR models are fit to each pixel independently. So, it makes sense to implement a parallel version of
fitAR_map()
via an argumentncores
.Early on in the development of
remotePARTS
, I gave examples about how to applyfitAR()
to a full map with theraster
package. But, since this package doesn't officially support rasters, an implementation that can handle matrices (and dataframes) is important.The text was updated successfully, but these errors were encountered: