-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the purpose of the 10-iteration failure when updating GLM vectors/Laplace approx in MixedModels.jl #780
Comments
If you can share data + code, even privately, that is always useful for debugging the particulars of your situation. Or even a simulated dataset with the problem-causing structure. 😄 As for the reason for the number 10 -- that's the number of halving steps which means that you've reduced the step size 2^-10 = 0.000976562 the original step. Given that (penalized) iteratively re-weighted least squares typically converges very quickly, shrinking so small usually indicates a problem with the model that won't be addressed by additional steps. Have you tried |
Thank you for your response and the clear description of the process. In a previous iteration I used fast=true but I don't recall which reason prompted me to stop, Either I was not able to get the output returned to R (I am using JuliaCall in R and RCall and Jellyme4 in Julia to port data between programs) or I was receiving some nonsensical results when using fast=true (which could have been due to earlier model mis-specification as well. Datafile uploaded, sorry for the R code, this should be interpretable though:
I am relatively certain that the problem is due to rank deficiency in the fixed effects for elements of the random effect structure. See:
which shows three problems for this species/dataset: 1) some individuals only have a very small number of records, 2) some only have values for 1 or 2 of the 9 factors comprising Analysisclass, 3) most have no values for the reference class aRice_Wet_day (this is a group that prefers wetlands to rice). There is a lot to go wrong there, so I am not surprised by a convergence failure. I was just struggling to interpret the error message. Many thanks |
For completeness, here is a working R script and updated data file that includes both 1's and 0's for the dependent variable (the file above only had case=1, my apologies if you tried to run a GLMM with that). This script included here wraps the model call in a tryCatch to provide a longer Stacktrace as well. I very much appreciate the related discussion in the Julia forum and am thankful for @palday and @dmbates 's comments and attention to my question. |
I am able to fit the model by setting Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
case ~ 0 + Analysisclass + (1 | cropyear) + (1 | cropyear & individual_local_identifier)
Distribution: Bernoulli{Float64}
Link: LogitLink()
logLik deviance AIC AICc BIC
-52283.8716 104567.7433 104595.7433 104595.7530 104717.1024
Variance components:
Column Variance Std.Dev.
cropyear & individual_local_identifier (Intercept) 40.63670 6.37469
cropyear (Intercept) 0.00000 0.00000
Number of obs: 42981; levels of grouping factors: 78, 11
Fixed-effects parameters:
─────────────────────────────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
─────────────────────────────────────────────────────────────────────────────
Analysisclass: Nonhabitat_Dry_day -526.944 8.07256e109 -0.00 1.0000
Analysisclass: Nonhabitat_Dry_night -20.1344 1.07266 -18.77 <1e-77
Analysisclass: Nonhabitat_Wet_day -523.382 5.97722e109 -0.00 1.0000
Analysisclass: Nonhabitat_Wet_night -523.382 5.97722e109 -0.00 1.0000
Analysisclass: Wetland_Dry_day -14.4728 0.81342 -17.79 <1e-70
Analysisclass: Wetland_Dry_night -14.9585 0.817631 -18.29 <1e-74
Analysisclass: Wetland_Wet_day -11.5421 0.806793 -14.31 <1e-45
Analysisclass: Wetland_Wet_night -11.5783 0.8068 -14.35 <1e-45
Analysisclass: aRice_Dry_day -18.2457 0.852996 -21.39 <1e-99
Analysisclass: aRice_Dry_night -14.0891 0.807393 -17.45 <1e-67
Analysisclass: aRice_Wet_day -15.0857 0.818055 -18.44 <1e-75
Analysisclass: aRice_Wet_night -11.9868 0.807174 -14.85 <1e-49
───────────────────────────────────────────────────────────────────────────── The model is singular -- the estimated variance for One thing that interests me is that the estimates for 12×3 DataFrame
Row │ Analysisclass n %
│ String31 Int64 Float64
─────┼──────────────────────────────────────
1 │ Nonhabitat_Wet_day 355 0.8
2 │ Nonhabitat_Wet_night 355 0.8
3 │ Wetland_Dry_night 951 2.2
4 │ Wetland_Dry_day 986 2.3
5 │ aRice_Wet_day 2624 6.1
6 │ aRice_Wet_night 3767 8.8
7 │ Wetland_Wet_night 4917 11.4
8 │ Wetland_Wet_day 5021 11.7
9 │ aRice_Dry_day 5098 11.9
10 │ aRice_Dry_night 5915 13.8
11 │ Nonhabitat_Dry_day 6417 14.9
12 │ Nonhabitat_Dry_night 6575 15.3 Depending on your research question, it might make sense to split
I tried that and even the fast fit fails. Since that's just a reparameterization of your current model, that tells me that the model is very sensitive to parameterization, which is often not a great sign of being able to fit the desired model to the data in question. |
Thank you for looking that closely at things. This particular subset of the data is the most limited. And I expect you are correct that the data at hand is just not sufficient in this case. I've argued myself in circles about dummy coding the habitat x condition x photoperiod or including them as interacting effects as you suggest. The primary reason I settled on dummy coding is that some subsets have no use (case==1) at for some variable combinations leading to rank deficiency. Which creates its own problems with convergence. I have found the models to be more stable if I dummy code and I feel have rational interpretations whereas interaction coding are harder to wrap my head around. |
That makes sense! And one of the nice things about MixedModels.jl is that it can fit singular models. 😄
That sounds like a good reason to stick with your current contrast coding. 😄 I tried one more thing, using an experimental feature that uses a linear mixed model to initialize the GLMM (instead of using a GLM for the initialization). julia> model = fit(MixedModel, m0form, data, Bernoulli();
wts=float.(data.weights),
contrasts= Dict(:Analysisclass => DummyCoding(; base="aRice_Wet_day")),
init_from_lmm=[:θ],
fast=false,
progress=true,
verbose=false)
Minimizing 900 Time: 0:01:12 (80.47 ms/it)
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
case ~ 0 + Analysisclass + (1 | cropyear) + (1 | cropyear & individual_local_identifier)
Distribution: Bernoulli{Float64}
Link: LogitLink()
logLik deviance AIC AICc BIC
-52299.4230 104598.8461 104626.8461 104626.8559 104748.2053
Variance components:
Column Variance Std.Dev.
cropyear & individual_local_identifier (Intercept) 10.78505 3.28406
cropyear (Intercept) 0.02275 0.15082
Number of obs: 42981; levels of grouping factors: 78, 11
Fixed-effects parameters:
────────────────────────────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
────────────────────────────────────────────────────────────────────────────
Analysisclass: Nonhabitat_Dry_day -23.1087 4.72024 -4.90 <1e-06
Analysisclass: Nonhabitat_Dry_night -20.1036 1.11847 -17.97 <1e-71
Analysisclass: Nonhabitat_Wet_day -33.0436 2967.99 -0.01 0.9911
Analysisclass: Nonhabitat_Wet_night -17.5231 1.33543 -13.12 <1e-38
Analysisclass: Wetland_Dry_day -13.6633 0.439363 -31.10 <1e-99
Analysisclass: Wetland_Dry_night -14.1096 0.446209 -31.62 <1e-99
Analysisclass: Wetland_Wet_day -10.7518 0.427207 -25.17 <1e-99
Analysisclass: Wetland_Wet_night -10.7884 0.427222 -25.25 <1e-99
Analysisclass: aRice_Dry_day -17.6945 0.529162 -33.44 <1e-99
Analysisclass: aRice_Dry_night -13.2977 0.428339 -31.04 <1e-99
Analysisclass: aRice_Wet_day -14.2877 0.447925 -31.90 <1e-99
Analysisclass: aRice_Wet_night -11.1991 0.427925 -26.17 <1e-99
──────────────────────────────────────────────────────────────────────────── There are a few things to note here, and most them can be boiled down to a single caveat emptor: the maximum likelihood estimate of this model seems to be poorly defined (because the likelihood surface is very flat), which is usually a sign that you're trying to fit a model that's more complex than your data. In this case, that extra complexity are those coefficients related to the very care cases.
|
Wow, @palday you are really going above and beyond. One aspect of this analysis that probably should be mentioned, if it is not already clear, is that this model is an animal "resource selection function" the values where case=1 are the covariates that are "used" at all observed locations. Rows where case = 0 are "available" habitats that represent covariates that are proportional to the occurrence. The absolute number of records used to define availability is subjective therefore the absolute value of parameter estimates isn't actually of interest. What is important is the relative value. I'd have to look closer to see how different those are, but at least the order of selection seems consistent between the models (and in line with what I would expect). But I agree, any interpretation is probably pretty minimal. If you don't mind me asking one follow up question... when you say "the likelihood surface is very flat"... what are you looking at that tells you that? |
Ah, so the ideal way to do this check would be with profiling, which could then be used for plotting the likelihood function at various values of the parameters. But the profiling code is comparatively new and still fails in a few cases (to be fair, those cases are often numerically challenging cases like the models here) and I don't recall off the top of my head if we've tested it on GLMMs. A second approach would be computing the Hessian or Fisher information, but that would almost certainly fail here because of the (near) singularity. So what I've actually done is pieced together information from a few different sources:
|
That makes a lot of sense. Thank you for taking the time to explain the inner workings to me. It almost makes me feel like I know what I am doing In seriousness, I appreciate your observations and thoughts immensely. I am happy to consider this closed if you are. |
Sounds fun. 😄 |
Is there a reason that all the |
Following up on what @palday said about imbalance in the julia> pr = leftjoin!(
combine(groupby(df, :Analysisclass), nrow => :n),
combine(groupby(subset(df, :case), :Analysisclass), nrow => :pos);
on=:Analysisclass,
)
pr.pos = coalesce.(pr.pos, 0) # replace missing values by zeros
prop = transform!(pr, [:pos, :n] => ByRow(/) => :prop)
12×4 DataFrame
Row │ Analysisclass n pos prop
│ String31 Int64 Int64 Float64
─────┼─────────────────────────────────────────────────
1 │ Wetland_Wet_night 4917 2910 0.591824
2 │ Wetland_Dry_night 951 56 0.0588854
3 │ Wetland_Wet_day 5021 3014 0.600279
4 │ Wetland_Dry_day 986 91 0.0922921
5 │ aRice_Dry_night 5915 830 0.140321
6 │ aRice_Dry_day 5098 13 0.00255002
7 │ aRice_Wet_night 3767 1197 0.317759
8 │ aRice_Wet_day 2624 54 0.0205793
9 │ Nonhabitat_Dry_night 6575 2 0.000304183
10 │ Nonhabitat_Dry_day 6417 0 0.0
11 │ Nonhabitat_Wet_day 355 0 0.0
12 │ Nonhabitat_Wet_night 355 0 0.0 |
Thank you @dmbates, that is what I anticipated also. Normally when I have a scenario where a habitat is not use, I would group that class in with the Nonhabitat which would at least have SOME use since it is an amalgam of several things anyway. That sort of failed when it was Nonhabitat that had no use (despite it being painful obvious to me in that there is a high likelihood of that.) |
My apologies, I did not see your first response. The ".1" indicates that the individual is the first deployment of transmitter whose serial number precedes it (we can recover and redeploy transmitters). In this dataset, that "number" should be treated as a factor. There are other values in the larger dataset that are derived from different transmitters which contain characters, so the field should be recognized as a character when feeding into the model. |
I have a shorter script to check the counts and especially the missing cells. julia> unstack(combine(groupby(data, [:Analysisclass, :case]), nrow => :n), :case, :n)
12×3 DataFrame
Row │ Analysisclass false true
│ String31 Int64? Int64?
─────┼───────────────────────────────────────
1 │ Wetland_Wet_night 2007 2910
2 │ Wetland_Dry_night 895 56
3 │ Wetland_Wet_day 2007 3014
4 │ Wetland_Dry_day 895 91
5 │ aRice_Dry_night 5085 830
6 │ aRice_Dry_day 5085 13
7 │ aRice_Wet_night 2570 1197
8 │ aRice_Wet_day 2570 54
9 │ Nonhabitat_Dry_night 6573 2
10 │ Nonhabitat_Dry_day 6417 missing
11 │ Nonhabitat_Wet_day 355 missing
12 │ Nonhabitat_Wet_night 355 missing Checking with |
Thank you once again for your response and insights. I suppose I am somewhat fortunate that this problem is not very widespread across the different species and regions I am modelling. Which paradoxically makes it more difficult to determine how to deal with unused covariates when they do exist. There is always an oddball, usually its me. |
I am trying to deduce why my data set is producing the error from line 614
throw(ErrorException("number of averaging steps > 10"))
I suspect it is due to my random structure (nested individuals in months) having sparse representation of the fixed factors in a Bernoulli GLMM.
Does 10 'mean' anything or is it just a suitable number of times to be certain that random values for u produce a deviance that is greater than the previously estimated deviance?
I don't understand Julia language very well, so I likely misunderstand how the code functions. But, inside the while-loop starting line 610, I don't see how obj varies across the 10 iterations of nhalf. I see where each value of u is averaged and mapped to beta but not where they are reassigned. It looks like obj would be the same for each iteration.
Unfortunately, I am running my Julia model via JuliaCall so I don't think I get the verbose output to verify.
The text was updated successfully, but these errors were encountered: