Replies: 4 comments 3 replies
-
At a first glance (all I have time for at the moment), I think this sounds like something worth investigating. I suspect it will also make some of my tinkering on multi-membership models more straightforward because the sparsity pattern for those degrades even faster -- they're not even block diagonal in the (1,1) block. In general then, this implies that we would have the sparse (1,1) block plus one dense block per blocking variable? I also need to think a bit about whether this change would be technically breaking. |
Beta Was this translation helpful? Give feedback.
-
See also your previous thoughts in #234. |
Beta Was this translation helpful? Give feedback.
-
To help with benchmarking, we could create a non exported |
Beta Was this translation helpful? Give feedback.
-
The important benchmark will be for models fit to subject-item data sets such as |
Beta Was this translation helpful? Give feedback.
-
This has come up in the code refactoring of the
leverage
method forLinearMixedModel
but it could also apply more generally to the overall evaluation of the objective. At present, the symmetric matrix A = [Z X y]'[Z X y] is stored (lower triangle only) in blocked form. The rows and columns are divided intolength(reterms) + 1
blocks according to the grouping factors for the random effects (the last block is fromXymat
which is contains the model matrix for the fixed-effects parameters and the response). The lower triangular Cholesky factor, L, ofis similarly blocked.
For example, in a model for the sleepstudy data with vector-valued random effects for the 18 subjects, the block structure is
whereas for a model fit with vector-valued random effects for item and for subject in the
mrk17_exp1
data set the blocking isFor the leverage calculation, we need to solve a linear system defined by L to get the leverage of each observation. The big savings in computation comes from the (1, 1) block which is block diagonal. The nature of the system involved results in all blocks except one having zeros on the right hand side, hence only the one block needs to be solved. We go from a dimension 1200 system to a dimension 5 system. After that things get complicated and the full system involving the (2,1) and (2,2) blocks and the (3,1), (3,2) and (3,3) blocks need to be solved.
Essentially, as soon as one block in the first column of blocks is not sparse then everything needs to be treated as dense. The logic of the algorithm gets complicated for the blocks other than the first and the last ones. So my idea is always to reduce the problem to exactly three blocks by combining all the blocks after (1,1). So the (2,1) block would have dimension (438+33) by 1200 and the (2,2) block would be lower triangular of dimension (438+33) by (438+33). For the leverage function we don't need the last row (from the response) so the (2,1) and (2,2) blocks could be reduced to 470 rows.
It is likely that this reduction in the number of blocks could be beneficial to the objective evaluation as well as to the leverage. Essentially all the operations in the update of L become dense matrix operations, which can benefit from OpenBLAS, MKL and whatever the JuliaSIMD folks come up with, after the first block. The only downside I can see is for nested grouping factors when blocks like (2,1) are BlockedSparse but those cases are relatively rare.
Beta Was this translation helpful? Give feedback.
All reactions