Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-positive definite matrix when fitting nested GLMM #792

Closed
dhalpern opened this issue Nov 12, 2024 · 14 comments · Fixed by #796
Closed

Non-positive definite matrix when fitting nested GLMM #792

dhalpern opened this issue Nov 12, 2024 · 14 comments · Fixed by #796

Comments

@dhalpern
Copy link

dhalpern commented Nov 12, 2024

I am trying to understand some issues I've encountered when trying to fit some models or a psychology experiment on memory where there is a lot of nested structure.
For background, the goal is to estimate the effect of position within a list on the probability of recall. Each subject in the experiment participates in multiple sessions with multiple lists. The following code creates an example dataset:

using MixedModels, DataFrames, Random
Random.seed!(0);
df = allcombinations(DataFrame, "subject" => 1:100, 
    "session" => 1:6, "list" => 1:25, "serialpos" => 1:12)
df[!, :recalled] = rand(nrow(df))
df[!, :recalled] = df[!, :recalled] .> .5

Then to fit the model I use the following code:

serialpos_contrasts = Dict(
    :serialpos => DummyCoding()
)

m = fit(MixedModel, 
    form, 
    df, Bernoulli(), 
    contrasts=serialpos_contrasts; 
    progress=false);

I fit several models, some of which fit without problems and one that has problems. The following all fit fine:
(m1) form => @formula(recalled ~ serialpos + (1 | subject) + (1 | subject & session) + (1 | subject & session & list))
(m2) recalled ~ serialpos + zerocorr(serialpos | subject)
(m3) recalled ~ serialpos + (1 | subject) + (1 | serialpos & subject) + (1 | subject & session)

However, I run into issues when trying to fit the following model, with the subject effects varying and random intercepts by session:
(m4) recalled ~ serialpos + zerocorr(serialpos | subject) + (1 | subject & session))
This results in this error:

ERROR: LoadError: PosDefException: matrix is not positive definite; Factorization failed.

I realize this has been partially addressed before (e.g. #349 (comment)) but I'm not really sure I fully understand why (m4) has an error -- (m1) should have more parameters and (m3) should be nearly equivalent to (m4) (although I realize the design matrices are slightly different).
I thought perhaps the correlation structure might be an issue so I tried fitting two models with fulldummy which seems more straight forward.

(m5) recalled ~ serialpos + zerocorr(fulldummy(serialpos) | subject)
(m6) recalled ~ serialpos + zerocorr(fulldummy(serialpos) | subject) + (1 | subject & session)

(m5) fit fine and (m6) ran into the positive definite error.

Is there any good intuition for what is going on here?

@kliegl
Copy link

kliegl commented Nov 12, 2024

What happens if you include some clustering in the simulation? Or am I missing this in your code?

@dhalpern
Copy link
Author

No, there is no clustering in the current simulation (recalled is just a Bernoulli RV for every row in the dataframe). Just to clarify, you're suggesting creating a simulation where there is non-zero variability due to subject/session etc?

@kliegl
Copy link

kliegl commented Nov 12, 2024

Right. Your data have nothing to be recovered in the random-effect structure if I understand correctly. So these data do not require a GLMM; they would be fitted with a GLM.

@dhalpern
Copy link
Author

That is correct -- it still seems strange to me that this error would occur only with very specific design matrices (given that there is no random-effect structure in any of the models) but I will try a simulation with clustering and report back!

@dhalpern
Copy link
Author

dhalpern commented Nov 12, 2024

I added the following code to the data generation prior to fitting the model:

sim_m = GeneralizedLinearMixedModel(form, df, Bernoulli(); contrasts=serialpos_contrasts)
df[!, :recalled] = simulate(MersenneTwister(42), sim_m)

My understanding is that the random effect variances are always initialized to 1 and the fixed effects are initialized to small values near 0. In any case, I believe this should simulate data assuming that the model being fitted is correct.
However, when using this code, the model I called (m4) above (recalled ~ serialpos + zerocorr(serialpos | subject) + (1 | subject & session)) still resulted in the same non-positive definite error.

@dhalpern
Copy link
Author

I should note that this error actually occurs during construction of the model object, prior to any model fitting. The following code is enough to produce the error:

using DataFrames, Random, MixedModels
Random.seed!(0);
df = allcombinations(DataFrame, "subject" => 1:25, 
    "session" => 1:6, "list" => 1:15, "serialpos" => 1:12)
df[!, :recalled] = rand(nrow(df))
df[!, :recalled] = df[!, :recalled] .> .5;
serialpos_contrasts = Dict(
    :serialpos => DummyCoding()
)
form = @formula(recalled ~ serialpos + zerocorr(serialpos | subject) + (1 | subject & session))
m = GeneralizedLinearMixedModel(form, df, Bernoulli(); contrasts=serialpos_contrasts)

Therefore, it cannot be about the relationship between the response variable and the covariates.

@kliegl
Copy link

kliegl commented Nov 12, 2024

The two formulae are acceptable for linear mixed models. I am out of my depths with GLMM.

lmm =  fit(MixedModel, form, df; contrasts=serialpos_contrasts)

@palday
Copy link
Member

palday commented Nov 12, 2024

Therefore, it cannot be about the relationship between the response variable and the covariates.

This isn't quite right -- the failing factorization also occurs during model creation because the factorization is part of the internal representation of a GeneralizedLinearMixedModel. The initial factorization uses the initial parameter estimates, which may not be constraining enough to get reasonable estimates. More precisely, for a dataset, where the ground truth has no variation between levels of the grouping variable (as in the examples up top), the default initial parameter values may be so bad that they are fundamentally incompatible with the data and thus you get the example here.

Since @kliegl pointed out that the LMM fit seems to work, that highlights an experimental feature that we have that starts off with an LMM fit to get better initial starting values for the GLMM fit. You can try passing the keyword argument init_from_lmm=[:β, :θ] to use the LMM as a starting point.

@dhalpern
Copy link
Author

Thanks so much for all your help in understanding this so far @palday and @kliegl. @palday thanks for the suggestion but (and please correctly if I'm wrong) I'm not sure if that will solve the problem because the error occurs prior to fitting in the constructor for the GeneralizedLinearMixedModel while init_from_lmm is a parameter for fit! . To clarify things, here is the traceback:

Stacktrace:
  [1] cholUnblocked!(A::Matrix{Float64}, ::Type{Val{:L}})
    @ MixedModels ~/.julia/packages/MixedModels/sKseQ/src/linalg/cholUnblocked.jl:34
  [2] updateL!(m::LinearMixedModel{Float64})
    @ MixedModels ~/.julia/packages/MixedModels/sKseQ/src/linearmixedmodel.jl:1323
  [3] reweight!(m::LinearMixedModel{Float64}, weights::Vector{Float64})
    @ MixedModels ~/.julia/packages/MixedModels/sKseQ/src/linearmixedmodel.jl:1033
  [4] deviance!(m::GeneralizedLinearMixedModel{Float64, Bernoulli{Float64}}, nAGQ::Int64)
    @ MixedModels ~/.julia/packages/MixedModels/sKseQ/src/generalizedlinearmixedmodel.jl:144
  [5] GeneralizedLinearMixedModel(f::StatsModels.FormulaTerm{StatsModels.Term, Tuple{StatsModels.Term, StatsModels.FunctionTerm{typeof(zerocorr), Vector{StatsModels.FunctionTerm{typeof(|), Vector{StatsModels.AbstractTerm}}}}, StatsModels.FunctionTerm{typeof(|), Vector{StatsModels.AbstractTerm}}}}, tbl::@NamedTuple{subject::Vector{Int64}, session::Vector{Int64}, list::Vector{Int64}, serialpos::Vector{Int64}, recalled::BitVector}, d::Bernoulli{Float64}, l::LogitLink; wts::Vector{Any}, offset::Vector{Any}, contrasts::Dict{Symbol, DummyCoding}, amalgamate::Bool)
    @ MixedModels ~/.julia/packages/MixedModels/sKseQ/src/generalizedlinearmixedmodel.jl:472
  [6] GeneralizedLinearMixedModel
    @ ~/.julia/packages/MixedModels/sKseQ/src/generalizedlinearmixedmodel.jl:386 [inlined]
  [7] fit(::Type{GeneralizedLinearMixedModel}, f::StatsModels.FormulaTerm{StatsModels.Term, Tuple{StatsModels.Term, StatsModels.FunctionTerm{typeof(zerocorr), Vector{StatsModels.FunctionTerm{typeof(|), Vector{StatsModels.AbstractTerm}}}}, StatsModels.FunctionTerm{typeof(|), Vector{StatsModels.AbstractTerm}}}}, tbl::@NamedTuple{subject::Vector{Int64}, session::Vector{Int64}, list::Vector{Int64}, serialpos::Vector{Int64}, recalled::BitVector}, d::Bernoulli{Float64}, l::LogitLink; wts::Vector{Any}, contrasts::Dict{Symbol, DummyCoding}, offset::Vector{Any}, amalgamate::Bool, kwargs::@Kwargs{init_from_lmm::Vector{Symbol}, progress::Bool})
    @ MixedModels ~/.julia/packages/MixedModels/sKseQ/src/generalizedlinearmixedmodel.jl:187
  [8] fit(::Type{GeneralizedLinearMixedModel}, f::StatsModels.FormulaTerm{StatsModels.Term, Tuple{StatsModels.Term, StatsModels.FunctionTerm{typeof(zerocorr), Vector{StatsModels.FunctionTerm{typeof(|), Vector{StatsModels.AbstractTerm}}}}, StatsModels.FunctionTerm{typeof(|), Vector{StatsModels.AbstractTerm}}}}, tbl::DataFrame, d::Bernoulli{Float64}, l::LogitLink; kwargs::@Kwargs{contrasts::Dict{Symbol, DummyCoding}, init_from_lmm::Vector{Symbol}, progress::Bool})
    @ MixedModels ~/.julia/packages/MixedModels/sKseQ/src/generalizedlinearmixedmodel.jl:172
  [9] #fit#100
    @ ~/.julia/packages/MixedModels/sKseQ/src/generalizedlinearmixedmodel.jl:201 [inlined]
 [10] fit
    @ ~/.julia/packages/MixedModels/sKseQ/src/generalizedlinearmixedmodel.jl:193 [inlined]

Specifically the issue seems to be with the deviance calculation here.
At this point I'm wondering what exactly this deviance calculation does. The following code (essentially adapted generalizedlinearmixedmodel.jl) from runs and constructs a GLMM object without issue.

using DataFrames, MixedModels, Random, GLM

Random.seed!(0);
df = allcombinations(DataFrame, "subject" => 1:25, 
    "session" => 1:6, "list" => 1:15, "serialpos" => 1:12)
df[!, :recalled] = rand(nrow(df))
df[!, :recalled] = df[!, :recalled] .> .5

contrasts = Dict(
    :serialpos => DummyCoding()
)
f = @formula(recalled ~ serialpos + zerocorr(serialpos | subject) + (1 | subject & session))
wts = []
d = Bernoulli()
l = canonicallink(d)
tbl = Tables.columntable(df)
amalgamate = true
offset=[]

LMM = LinearMixedModel(f, tbl; contrasts, wts, amalgamate)
y = copy(LMM.y)
constresponse = all(==(first(y)), y)
# the sqrtwts field must be the correct length and type but we don't know those
# until after the model is constructed if wt is empty.  Because a LinearMixedModel
# type is immutable, another one must be created.
if isempty(wts)
    LMM = LinearMixedModel(
        LMM.formula,
        LMM.reterms,
        LMM.Xymat,
        LMM.feterm,
        fill!(similar(y), 1),
        LMM.parmap,
        LMM.dims,
        LMM.A,
        LMM.L,
        LMM.optsum,
    )
end
X = MixedModels.fullrankx(LMM.feterm)
# if the response is constant, there's no point (and this may even fail)
# we allow this instead of simply failing so that a constant response can
# be used as the starting point to simulation where the response will be
# overwritten before fitting
constresponse || updateL!(LMM)
# fit a glm to the fixed-effects only
T = eltype(LMM.Xymat)
# newer versions of GLM (>1.8.0) have a kwarg dropcollinear=true
# which creates problems for the empty fixed-effects case during fitting
# so just don't allow fitting
# XXX unfortunately, this means we have double-rank deficiency detection
# TODO: construct GLM by hand so that we skip collinearity checks
# TODO: extend this so that we never fit a GLM when initializing from LMM
dofit = size(X, 2) != 0 # GLM.jl kwarg
gl = glm(X, y, d, l;
    wts=convert(Vector{T}, wts),
    dofit,
    offset=convert(Vector{T}, offset))
β = dofit ? coef(gl) : T[]
u = [fill(zero(eltype(y)), MixedModels.vsize(t), MixedModels.nlevs(t)) for t in LMM.reterms]
# vv is a template vector used to initialize fields for AGQ
# it is empty unless there is a single random-effects term
vv = length(u) == 1 ? vec(first(u)) : similar(y, 0)

res = GeneralizedLinearMixedModel{T,typeof(d)}(
    LMM,
    β,
    copy(β),
    LMM.θ,
    copy.(u),
    u,
    zero.(u),
    gl.rr,
    similar(y),
    oftype(y, wts),
    similar(vv),
    similar(vv),
    similar(vv),
    similar(vv),
)

Would there be an issue with calling fit! on the res object at this point? Or is the deviance call necessary?

@dhalpern
Copy link
Author

In the other direction, I've now attempted to produce an example where the data is generated with clustering and the model is "correct"

using DataFrames, Random, Distributions, LogExpFunctions, MixedModels
Random.seed!(0);

n_subjects = 100
n_sessions = 6
n_lists = 25
n_serialpos = 12
df = allcombinations(DataFrame, "subject" => 1:n_subjects, 
    "session" => 1:n_sessions, "list" => 1:n_lists, "serialpos" => 1:n_serialpos)

serialpos_fe_df = DataFrame("serialpos" => 1:n_serialpos,
    "serialpos_fe" => randn(n_serialpos))
sub_re_df = DataFrame("subject" => 1:n_subjects, "sub_re" => randn(n_subjects))
sub_session_re_df = allcombinations(DataFrame, "subject" => 1:n_subjects, 
        "session" => 1:n_sessions)
sub_session_re_df[!, :sub_session_re] = randn(n_subjects * n_sessions)
sub_serialpos_re_df = allcombinations(DataFrame, "subject" => 1:n_subjects, 
        "serialpos" => 1:n_serialpos)
sub_serialpos_re_df[!, :sub_serialpos_re] = randn(n_subjects * n_serialpos)

df = innerjoin(df, serialpos_fe_df, on = :serialpos)
df = innerjoin(df, sub_re_df, on = :subject)
df = innerjoin(df, sub_session_re_df, on = [:subject, :session])
df = innerjoin(df, sub_serialpos_re_df, on = [:subject, :serialpos])

df[!, :linpred] = df[!, :serialpos_fe] + df[!, :sub_re] + df[!, :sub_session_re] + df[!, :sub_serialpos_re]
df[!, :predprob] = logistic.(df[!, :linpred])
df[!, :recalled] = rand.(Bernoulli.(df[!, :predprob]))

serialpos_contrasts = Dict(
    :serialpos => DummyCoding()
)

form = @formula(recalled ~ serialpos + zerocorr(fulldummy(serialpos) | subject) + (1 | subject & session))
m = GeneralizedLinearMixedModel(form, df, Bernoulli(); contrasts=serialpos_contrasts)

This resulted in the same positive-definite matrix error. That being said, I'm still somewhat of a Julia novice so please correct me if I've made any mistakes here!

@dhalpern
Copy link
Author

dhalpern commented Dec 4, 2024

Sorry to bother you again @kliegl and @palday but I'd be very curious if it is possible to fit this kind of model since it seems relatively standard so I'd be interested if I'm doing anything wrong. Imagine we have subjects who complete multiple sessions of an experiment. In each session, they do a number of trials in one of 2 conditions and respond with a binary outcome. We want to know the effect of the condition, allowing for variability in the condition effects by subject and also in the intercepts across sessions. I believe the following simulates the proposed random effect structure:

using DataFrames, Random, Distributions, LogExpFunctions, MixedModels
Random.seed!(0);
n_subjects = 10
n_sessions = 2
n_trials_per_session = 2
n_conds = 2
df = allcombinations(DataFrame, "subject" => 1:n_subjects, 
    "session" => 1:n_sessions, "trial" => 1:n_trials_per_session, "condition" => 1:n_conds)

cond_fe_df = DataFrame("condition" => 1:n_conds,
    "cond_fe" => randn(n_conds))
sub_re_df = DataFrame("subject" => 1:n_subjects, "sub_re" => randn(n_subjects))
sub_session_re_df = allcombinations(DataFrame, "subject" => 1:n_subjects, 
        "session" => 1:n_sessions)
sub_session_re_df[!, :sub_session_re] = randn(n_subjects * n_sessions)
sub_cond_1_re_df = allcombinations(DataFrame, "subject" => 1:n_subjects, 
        "condition" => 1)
sub_cond_1_re_df[!, :sub_cond_re] = randn(n_subjects)
sub_cond_2_re_df = allcombinations(DataFrame, "subject" => 1:n_subjects, 
        "condition" => 2)
sub_cond_2_re_df[!, :sub_cond_re] = randn(n_subjects)
sub_cond_re_df = vcat(sub_cond_1_re_df, sub_cond_2_re_df)

df = innerjoin(df, cond_fe_df, on = :condition)
df = innerjoin(df, sub_re_df, on = :subject)
df = innerjoin(df, sub_session_re_df, on = [:subject, :session])
df = innerjoin(df, sub_cond_re_df, on = [:subject, :condition])

df[!, :linpred] = df[!, :cond_fe] + df[!, :sub_re] + df[!, :sub_session_re] + df[!, :sub_cond_re]
df[!, :predprob] = logistic.(df[!, :linpred])
df[!, :outcome] = rand.(Bernoulli.(df[!, :predprob]))

contrasts = Dict(
    :condition => DummyCoding()
)

f = @formula(outcome ~ condition + zerocorr(1 + fulldummy(condition) | subject) + (1 | subject & session))
m = GeneralizedLinearMixedModel(f, df, Bernoulli(); contrasts=contrasts)

This results in a PosDefException. Bizarrely, if I set n_subjects to 9 (or fewer) there is no problem but if it is 10 or higher it results in the exception.

As I expected, it is possible to fit this model in lme4 without issues. I'd be very curious to know what the differences are in initializing the model or if the lme4 version is incorrect!

library(lme4)
library(tidyverse)
set.seed(0)
n_subjects <- 10
n_sessions <- 2
n_trials_per_session <- 2
n_conds <- 2
df <- expand_grid(subject = 1:n_subjects, 
          session = 1:n_sessions,
          trial = 1:n_trials_per_session, 
          condition = 1:n_conds)

cond_fe_df <- tibble(condition = 1:n_conds,
                       cond_fe = rnorm(n_conds))
sub_re_df <- tibble(subject = 1:n_subjects, 
                       sub_re = rnorm(n_subjects))
sub_session_re_df = expand_grid(subject = 1:n_subjects, 
                                session = 1:n_sessions) %>%
  mutate(sub_session_re = rnorm(n_subjects * n_sessions))
sub_cond_1_re_df = tibble(subject = 1:n_subjects, condition = 1) %>%
  mutate(sub_cond_re = rnorm(n_subjects))
sub_cond_2_re_df = tibble(subject = 1:n_subjects, condition = 2) %>%
  mutate(sub_cond_re = rnorm(n_subjects))
sub_cond_re_df <- bind_rows(sub_cond_1_re_df, sub_cond_2_re_df)

df <- inner_join(df, cond_fe_df, by = join_by(condition)) %>%
  inner_join(sub_re_df, by = join_by(subject)) %>%
  inner_join(sub_session_re_df, by = join_by(subject, session)) %>%
  inner_join(sub_cond_re_df, by = join_by(subject, condition))

df <- df %>% mutate(
  linpred = cond_fe + sub_re + sub_session_re + sub_cond_re,
  predprob = plogis(linpred),
  outcome = rbinom(n(), size = 1, predprob))

f <- outcome ~ condition + (dummy(condition, 1) + dummy(condition, 2) || subject) + (1 | subject:session)
m <- glmer(f, data = df, family = binomial)

@palday
Copy link
Member

palday commented Dec 10, 2024

As I expected, it is possible to fit this model in lme4 without issues. I'd be very curious to know what the differences are in initializing the model or if the lme4 version is incorrect!

I would be curious if it works with exactly the same data -- setting the same seed across different software / software versions isn't enough to guarantee that you're generating the exact same data.

Going back to the underlying problem: there may be a well-specified model for this data, but the initial parameter estimates just don't lead to a covariance matrix that is numerically positive definite. (Intuitively: we get an initial negative variance estimate that is negative, which is not allowed. Or at least negative after rounding.) There may very well be a valid fit, but since the fitting process is iterative, we can't get there without being able to do at least one good step. So it comes down to finding a good set of initial parameter values. And that's a bit tricky.

I have a few ideas that I'll tinker with as I get time, but I can't promise any results. 😢

@dhalpern
Copy link
Author

Thank you so much @palday for looking into this -- I see you may have already come up with a solution. For what it's worth, the following uses the same simulated data in both R and Julia and results in a singular but functioning glmer fit but PosDefException in when creating the object in MixedModels. I tried about 10 random seeds and this occurred for all of them so it doesn't necessarily seem to be an issue of an unlucky random seed.

using DataFrames, Random, Distributions, LogExpFunctions, MixedModels, RCall
Random.seed!(5);
n_subjects = 10
n_sessions = 2
n_trials_per_session = 2
n_conds = 2
df = allcombinations(DataFrame, "subject" => 1:n_subjects, 
    "session" => 1:n_sessions, "trial" => 1:n_trials_per_session, "condition" => 1:n_conds)

cond_fe_df = DataFrame("condition" => 1:n_conds,
    "cond_fe" => randn(n_conds))
sub_re_df = DataFrame("subject" => 1:n_subjects, "sub_re" => randn(n_subjects))
sub_session_re_df = allcombinations(DataFrame, "subject" => 1:n_subjects, 
        "session" => 1:n_sessions)
sub_session_re_df[!, :sub_session_re] = randn(n_subjects * n_sessions)
sub_cond_1_re_df = allcombinations(DataFrame, "subject" => 1:n_subjects, 
        "condition" => 1)
sub_cond_1_re_df[!, :sub_cond_re] = randn(n_subjects)
sub_cond_2_re_df = allcombinations(DataFrame, "subject" => 1:n_subjects, 
        "condition" => 2)
sub_cond_2_re_df[!, :sub_cond_re] = randn(n_subjects)
sub_cond_re_df = vcat(sub_cond_1_re_df, sub_cond_2_re_df)

df = innerjoin(df, cond_fe_df, on = :condition)
df = innerjoin(df, sub_re_df, on = :subject)
df = innerjoin(df, sub_session_re_df, on = [:subject, :session])
df = innerjoin(df, sub_cond_re_df, on = [:subject, :condition])

df[!, :linpred] = df[!, :cond_fe] + df[!, :sub_re] + df[!, :sub_session_re] + df[!, :sub_cond_re]
df[!, :predprob] = logistic.(df[!, :linpred])
df[!, :outcome] = rand.(Bernoulli.(df[!, :predprob]))

contrasts = Dict(
    :condition => DummyCoding()
)

f = @formula(outcome ~ condition + zerocorr(1 + fulldummy(condition) | subject) + (1 | subject & session))
m = GeneralizedLinearMixedModel(f, df, Bernoulli(); contrasts=contrasts)

@rput df

R"""
library(lme4)
f <- outcome ~ condition + (dummy(condition, 1) + dummy(condition, 2) || subject) + (1 | subject:session)
m <- glmer(f, data = df, family = binomial)
"""

I really appreciate you working on a general solution! For my own education, is there a straightforward way to change the inits when constructing a MixedModel object? I know there is the setθ! function but that would require an object to be constructed already.

@palday
Copy link
Member

palday commented Dec 10, 2024

@dhalpern There isn't at the moment -- you can call the low-level constructor directly with a custom OptSummary, but that would be quite annoying.

With the fix in #796, there is no second attempt to call deviance!, so it should still return a model object even after a PosDefException. Then you can manually modify the various fields of the optsum property of the returned model object, e.g. model.optsum.initial is the initial parameter vector. You can also modify things like optsum.ftol_rel to set convergence tolerances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants