From 8edc0aca23d78dd57e40c60fec0115b319a78944 Mon Sep 17 00:00:00 2001 From: Mousum Date: Sun, 17 Apr 2022 10:46:55 +0530 Subject: [PATCH 01/23] Added description of a few more methods applied to the fitted models. Also exported aic, aicc, bic so that these methods can be accessed from GLM module directly --- docs/src/index.md | 21 ++++++++++++++++++--- src/GLM.jl | 5 +++-- 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/docs/src/index.md b/docs/src/index.md index d21d69ee..573db8c1 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -147,16 +147,31 @@ F-test: 2 models fitted on 50 observations ## Methods applied to fitted models Many of the methods provided by this package have names similar to those in [R](http://www.r-project.org). +- `adjr2`: adjusted R² for a linear model +- `bic`: Bayesian Information Criterion, defined as ``-2 \\log L + k \\log n``, with ``L`` +the likelihood of the model, ``k`` is the number of consumed degrees of freedom - `coef`: extract the estimates of the coefficients in the model +- `confint`: compute confidence intervals for coefficients, with confidence level `level` (by default 95%) - `deviance`: measure of the model fit, weighted residual sum of squares for lm's +- `dof`: return the number of degrees of freedom consumed in the model, including +when applicable the intercept and the distribution's dispersion parameter - `dof_residual`: degrees of freedom for residuals, when meaningful +- `fitted`: return the fitted values of the model - `glm`: fit a generalized linear model (an alias for `fit(GeneralizedLinearModel, ...)`) +- `aic`: Akaike's Information Criterion, defined as ``-2 \\log L + 2k``, with ``L`` the likelihood of the model, and `k` it the number of consumed degrees of freedom +- `aicc`: corrected Akaike's Information Criterion for small sample sizes (Hurvich and Tsai 1989) - `lm`: fit a linear model (an alias for `fit(LinearModel, ...)`) -- `r2`: R² of a linear model or pseudo-R² of a generalized linear model -- `stderror`: standard errors of the coefficients -- `vcov`: estimated variance-covariance matrix of the coefficient estimates +- `loglikelihood`: return the log-likelihood of the model +- `modelmatrix`: return the design matrix +- `nobs`: return the number of rows, or sum of the weights when prior weights are specified +- `nulldeviance`: return the deviance of the linear model which includs the intercept only +- `nullloglikelihood`: return the log-likelihood of the null model corresponding to the fitted linear model - `predict` : obtain predicted values of the dependent variable from the fitted model +- `r2`: R² of a linear model - `residuals`: get the vector of residuals from the fitted model +- `response`: return the model response (a.k.a the dependent variable) +- `stderror`: standard errors of the coefficients +- `vcov`: estimated variance-covariance matrix of the coefficient estimates Note that the canonical link for negative binomial regression is `NegativeBinomialLink`, but in practice one typically uses `LogLink`. diff --git a/src/GLM.jl b/src/GLM.jl index 3f7d10b3..11d35f78 100644 --- a/src/GLM.jl +++ b/src/GLM.jl @@ -12,14 +12,15 @@ module GLM import Statistics: cor import StatsBase: coef, coeftable, confint, deviance, nulldeviance, dof, dof_residual, loglikelihood, nullloglikelihood, nobs, stderror, vcov, residuals, predict, - fitted, fit, model_response, response, modelmatrix, r2, r², adjr2, adjr², PValue + fitted, fit, model_response, response, modelmatrix, r2, r², adjr2, adjr², PValue, + aic, aicc, bic import StatsFuns: xlogy import SpecialFunctions: erfc, erfcinv, digamma, trigamma import StatsModels: hasintercept export coef, coeftable, confint, deviance, nulldeviance, dof, dof_residual, loglikelihood, nullloglikelihood, nobs, stderror, vcov, residuals, predict, fitted, fit, fit!, model_response, response, modelmatrix, r2, r², adjr2, adjr², - cooksdistance, hasintercept + cooksdistance, hasintercept, aic, aicc, bic export # types From 57cf413629c7d11145cb26b88479e2fff33ce4dc Mon Sep 17 00:00:00 2001 From: Mousum Date: Sun, 17 Apr 2022 12:22:26 +0530 Subject: [PATCH 02/23] Added a set of examples to show how to use methods on fitted models --- docs/src/index.md | 76 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 76 insertions(+) diff --git a/docs/src/index.md b/docs/src/index.md index 573db8c1..c114493c 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -152,6 +152,7 @@ Many of the methods provided by this package have names similar to those in [R]( the likelihood of the model, ``k`` is the number of consumed degrees of freedom - `coef`: extract the estimates of the coefficients in the model - `confint`: compute confidence intervals for coefficients, with confidence level `level` (by default 95%) +- `cooksdistance`: compute [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation in linear model `obj`, giving an estimate of the influence of each data point. Currently only implemented for linear models without weights. - `deviance`: measure of the model fit, weighted residual sum of squares for lm's - `dof`: return the number of degrees of freedom consumed in the model, including when applicable the intercept and the distribution's dispersion parameter @@ -173,9 +174,84 @@ when applicable the intercept and the distribution's dispersion parameter - `stderror`: standard errors of the coefficients - `vcov`: estimated variance-covariance matrix of the coefficient estimates + Note that the canonical link for negative binomial regression is `NegativeBinomialLink`, but in practice one typically uses `LogLink`. +```jldoctest methods +julia> using GLM, DataFrames +julia> data = DataFrame(X=[1,2,3], y=[2,4,7]) +julia> test_data = DataFrame(X=[4]) +julia> mdl = lm(@formula(y ~ X), data) +julia> r2(mdl) +0.9868421052631579 + +julia> adjr2(mdl) +0.9736842105263157 + +julia> bic(mdl) +3.1383527915438716 + +julia> coef(mdl) +2-element Vector{Float64}: + -0.6666666666666728 + 2.500000000000003 + +julia> confint(mdl, level=0.90) +2×2 Matrix{Float64}: + -4.60398 3.27065 + 0.677377 4.32262 + +julia> deviance(mdl) +0.16666666666666666 + +julia> dof(mdl) +3 + +julia> dof_residual(mdl) +1.0 + +julia> aic(mdl) +5.8425159255395425 + +julia> aicc(mdl) +-18.157484074460456 + +julia> loglikelihood(mdl) +0.07874203723022877 + +julia> nullloglikelihood(mdl) +-6.417357973199268 +``` +`predict` method returns predicted values of response variable from covariate values `newX`. +If you ommit `newX` then it return fitted response values. + +```jldoctest methods +julia> predict(mdl) +3-element Vector{Float64}: + 1.8333333333333304 + 4.333333333333333 + 6.833333333333336 + +julia> predict(mdl, test_data) +1-element Vector{Union{Missing, Float64}}: + 9.33333333333334 + +julia> stderror(mdl) +2-element Vector{Float64}: + 0.6236095644623237 + 0.2886751345948129 +``` +`cooksdistance` method computes [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation in linear model `obj`, giving an estimate of the influence of each data point. Currently only implemented for linear models without weights. + +```jldoctest methods +julia> cooksdistance(mdl) +3-element Vector{Float64}: + 2.500000000000079 + 0.2499999999999991 + 2.499999999999919 +``` + ## Separation of response object and predictor object The general approach in this code is to separate functionality related From f2ac0ec99969dfe90ec15372fc15faadc911bdce Mon Sep 17 00:00:00 2001 From: Mousum Date: Mon, 25 Apr 2022 16:47:58 +0530 Subject: [PATCH 03/23] updated documentation for menthods applied to fitted model --- docs/src/index.md | 42 +++++++++++++++++++++++------------------- src/GLM.jl | 2 +- 2 files changed, 24 insertions(+), 20 deletions(-) diff --git a/docs/src/index.md b/docs/src/index.md index c114493c..4a1a1240 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -147,7 +147,9 @@ F-test: 2 models fitted on 50 observations ## Methods applied to fitted models Many of the methods provided by this package have names similar to those in [R](http://www.r-project.org). -- `adjr2`: adjusted R² for a linear model +- `adjr2`: adjusted R² for a linear model (an alias for `adjr²`) +- `aic`: Akaike's Information Criterion, defined as ``-2 \\log L + 2k``, with ``L`` the likelihood of the model, and `k` it the number of consumed degrees of freedom +- `aicc`: corrected Akaike's Information Criterion for small sample sizes (Hurvich and Tsai 1989) - `bic`: Bayesian Information Criterion, defined as ``-2 \\log L + k \\log n``, with ``L`` the likelihood of the model, ``k`` is the number of consumed degrees of freedom - `coef`: extract the estimates of the coefficients in the model @@ -159,8 +161,6 @@ when applicable the intercept and the distribution's dispersion parameter - `dof_residual`: degrees of freedom for residuals, when meaningful - `fitted`: return the fitted values of the model - `glm`: fit a generalized linear model (an alias for `fit(GeneralizedLinearModel, ...)`) -- `aic`: Akaike's Information Criterion, defined as ``-2 \\log L + 2k``, with ``L`` the likelihood of the model, and `k` it the number of consumed degrees of freedom -- `aicc`: corrected Akaike's Information Criterion for small sample sizes (Hurvich and Tsai 1989) - `lm`: fit a linear model (an alias for `fit(LinearModel, ...)`) - `loglikelihood`: return the log-likelihood of the model - `modelmatrix`: return the design matrix @@ -168,7 +168,7 @@ when applicable the intercept and the distribution's dispersion parameter - `nulldeviance`: return the deviance of the linear model which includs the intercept only - `nullloglikelihood`: return the log-likelihood of the null model corresponding to the fitted linear model - `predict` : obtain predicted values of the dependent variable from the fitted model -- `r2`: R² of a linear model +- `r2`: R² of a linear model (an alias for `r²`) - `residuals`: get the vector of residuals from the fitted model - `response`: return the model response (a.k.a the dependent variable) - `stderror`: standard errors of the coefficients @@ -179,23 +179,29 @@ Note that the canonical link for negative binomial regression is `NegativeBinomi in practice one typically uses `LogLink`. ```jldoctest methods -julia> using GLM, DataFrames -julia> data = DataFrame(X=[1,2,3], y=[2,4,7]) -julia> test_data = DataFrame(X=[4]) -julia> mdl = lm(@formula(y ~ X), data) -julia> r2(mdl) -0.9868421052631579 +julia> using GLM, DataFrames; -julia> adjr2(mdl) -0.9736842105263157 +julia> data = DataFrame(X=[1,2,3], y=[2,4,7]); -julia> bic(mdl) -3.1383527915438716 +julia> test_data = DataFrame(X=[4]); + +julia> mdl = lm(@formula(y ~ X), data); julia> coef(mdl) 2-element Vector{Float64}: -0.6666666666666728 2.500000000000003 + +julia> stderror(mdl) +2-element Vector{Float64}: + 0.6236095644623237 + 0.2886751345948129 + +julia> r2(mdl) +0.9868421052631579 + +julia> adjr2(mdl) +0.9736842105263157 julia> confint(mdl, level=0.90) 2×2 Matrix{Float64}: @@ -217,6 +223,9 @@ julia> aic(mdl) julia> aicc(mdl) -18.157484074460456 +julia> bic(mdl) +3.1383527915438716 + julia> loglikelihood(mdl) 0.07874203723022877 @@ -236,11 +245,6 @@ julia> predict(mdl) julia> predict(mdl, test_data) 1-element Vector{Union{Missing, Float64}}: 9.33333333333334 - -julia> stderror(mdl) -2-element Vector{Float64}: - 0.6236095644623237 - 0.2886751345948129 ``` `cooksdistance` method computes [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation in linear model `obj`, giving an estimate of the influence of each data point. Currently only implemented for linear models without weights. diff --git a/src/GLM.jl b/src/GLM.jl index 11d35f78..71f07b9b 100644 --- a/src/GLM.jl +++ b/src/GLM.jl @@ -20,7 +20,7 @@ module GLM export coef, coeftable, confint, deviance, nulldeviance, dof, dof_residual, loglikelihood, nullloglikelihood, nobs, stderror, vcov, residuals, predict, fitted, fit, fit!, model_response, response, modelmatrix, r2, r², adjr2, adjr², - cooksdistance, hasintercept, aic, aicc, bic + cooksdistance, hasintercept, aic, aicc, bic, dispersion export # types From 5ec088a3f3582d374bd997ac795f015c5653bb06 Mon Sep 17 00:00:00 2001 From: Mousum Date: Tue, 26 Apr 2022 08:44:50 +0530 Subject: [PATCH 04/23] added a few more examples --- docs/src/index.md | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/docs/src/index.md b/docs/src/index.md index 4a1a1240..fddf315a 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -35,7 +35,7 @@ functions are Binomial (LogitLink) Gamma (InverseLink) InverseGaussian (InverseSquareLink) - NegativeBinomial (LogLink) + NegativeBinomial (NegativeBinomialLink, often used with LogLink) Normal (IdentityLink) Poisson (LogLink) @@ -156,6 +156,7 @@ the likelihood of the model, ``k`` is the number of consumed degrees of freedom - `confint`: compute confidence intervals for coefficients, with confidence level `level` (by default 95%) - `cooksdistance`: compute [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation in linear model `obj`, giving an estimate of the influence of each data point. Currently only implemented for linear models without weights. - `deviance`: measure of the model fit, weighted residual sum of squares for lm's +- `dispersion`: return the estimated dispersion (or scale) parameter for a model's distribution - `dof`: return the number of degrees of freedom consumed in the model, including when applicable the intercept and the distribution's dispersion parameter - `dof_residual`: degrees of freedom for residuals, when meaningful @@ -197,17 +198,17 @@ julia> stderror(mdl) 0.6236095644623237 0.2886751345948129 +julia> confint(mdl, level=0.90) +2×2 Matrix{Float64}: + -4.60398 3.27065 + 0.677377 4.32262 + julia> r2(mdl) 0.9868421052631579 julia> adjr2(mdl) 0.9736842105263157 -julia> confint(mdl, level=0.90) -2×2 Matrix{Float64}: - -4.60398 3.27065 - 0.677377 4.32262 - julia> deviance(mdl) 0.16666666666666666 @@ -226,14 +227,22 @@ julia> aicc(mdl) julia> bic(mdl) 3.1383527915438716 +julia> dispersion(mdl.model) +0.408248290463863 + julia> loglikelihood(mdl) 0.07874203723022877 julia> nullloglikelihood(mdl) -6.417357973199268 + +julia> vcov(mdl) +2×2 Matrix{Float64}: + 0.388889 -0.166667 + -0.166667 0.0833333 ``` `predict` method returns predicted values of response variable from covariate values `newX`. -If you ommit `newX` then it return fitted response values. +If you ommit `newX` then it return fitted response values. You will find more about [predict](https://juliastats.org/GLM.jl/stable/api/#StatsBase.predict) in the API docuemnt. ```jldoctest methods julia> predict(mdl) @@ -242,6 +251,9 @@ julia> predict(mdl) 4.333333333333333 6.833333333333336 +julia> fitted(mdl) ≈ predict(mdl) +true + julia> predict(mdl, test_data) 1-element Vector{Union{Missing, Float64}}: 9.33333333333334 From ac9143efe237fd4d0ab7a8d56f5e5b8067639b20 Mon Sep 17 00:00:00 2001 From: Mousum Date: Tue, 26 Apr 2022 09:48:14 +0530 Subject: [PATCH 05/23] rounding off to 8 decimal places to avoid floating points issues --- docs/src/index.md | 78 +++++++++++++++++++++++------------------------ 1 file changed, 39 insertions(+), 39 deletions(-) diff --git a/docs/src/index.md b/docs/src/index.md index fddf315a..9fbc16cc 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -188,29 +188,29 @@ julia> test_data = DataFrame(X=[4]); julia> mdl = lm(@formula(y ~ X), data); -julia> coef(mdl) +julia> round.(coef(mdl); digits=8) 2-element Vector{Float64}: - -0.6666666666666728 - 2.500000000000003 + -0.66666667 + 2.5 -julia> stderror(mdl) +julia> round.(stderror(mdl); digits=8) 2-element Vector{Float64}: - 0.6236095644623237 - 0.2886751345948129 + 0.62360956 + 0.28867513 -julia> confint(mdl, level=0.90) +julia> round.(confint(mdl); digits=8) 2×2 Matrix{Float64}: - -4.60398 3.27065 - 0.677377 4.32262 + -8.59038 7.25704 + -1.16797 6.16797 -julia> r2(mdl) -0.9868421052631579 +julia> round(r2(mdl); digits=8) +0.98684211 -julia> adjr2(mdl) -0.9736842105263157 +julia> round(adjr2(mdl); digits=8) +0.97368421 -julia> deviance(mdl) -0.16666666666666666 +julia> round(deviance(mdl); digits=8) +0.16666667 julia> dof(mdl) 3 @@ -218,25 +218,25 @@ julia> dof(mdl) julia> dof_residual(mdl) 1.0 -julia> aic(mdl) -5.8425159255395425 +julia> round(aic(mdl); digits=8) +5.84251593 -julia> aicc(mdl) --18.157484074460456 +julia> round(aicc(mdl); digits=8) +-18.15748407 -julia> bic(mdl) -3.1383527915438716 +julia> round(bic(mdl); digits=8) +3.13835279 -julia> dispersion(mdl.model) -0.408248290463863 +julia> round(dispersion(mdl.model); digits=8) +0.40824829 -julia> loglikelihood(mdl) -0.07874203723022877 +julia> round(loglikelihood(mdl); digits=8) +0.07874204 -julia> nullloglikelihood(mdl) --6.417357973199268 +julia> round(nullloglikelihood(mdl); digits=8) +-6.41735797 -julia> vcov(mdl) +julia> round.(vcov(mdl); digits=8) 2×2 Matrix{Float64}: 0.388889 -0.166667 -0.166667 0.0833333 @@ -245,27 +245,27 @@ julia> vcov(mdl) If you ommit `newX` then it return fitted response values. You will find more about [predict](https://juliastats.org/GLM.jl/stable/api/#StatsBase.predict) in the API docuemnt. ```jldoctest methods -julia> predict(mdl) +julia> round.(predict(mdl); digits=8) 3-element Vector{Float64}: - 1.8333333333333304 - 4.333333333333333 - 6.833333333333336 + 1.83333333 + 4.33333333 + 6.83333333 julia> fitted(mdl) ≈ predict(mdl) true -julia> predict(mdl, test_data) -1-element Vector{Union{Missing, Float64}}: - 9.33333333333334 +julia> round.(predict(mdl, test_data); digits=8) +1-element Vector{Float64}: + 9.33333333 ``` `cooksdistance` method computes [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation in linear model `obj`, giving an estimate of the influence of each data point. Currently only implemented for linear models without weights. ```jldoctest methods -julia> cooksdistance(mdl) +julia> round.(cooksdistance(mdl); digits=8) 3-element Vector{Float64}: - 2.500000000000079 - 0.2499999999999991 - 2.499999999999919 + 2.5 + 0.25 + 2.5 ``` ## Separation of response object and predictor object From e0398b215d313f0842c0e0af49556f875d175117 Mon Sep 17 00:00:00 2001 From: mousum-github <44145580+mousum-github@users.noreply.github.com> Date: Mon, 23 May 2022 07:19:48 +0530 Subject: [PATCH 06/23] Update docs/src/index.md Corrected a type error, and changed `includs` to `includes`. Co-authored-by: Alex Arslan --- docs/src/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/index.md b/docs/src/index.md index 9fbc16cc..3610d85d 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -166,7 +166,7 @@ when applicable the intercept and the distribution's dispersion parameter - `loglikelihood`: return the log-likelihood of the model - `modelmatrix`: return the design matrix - `nobs`: return the number of rows, or sum of the weights when prior weights are specified -- `nulldeviance`: return the deviance of the linear model which includs the intercept only +- `nulldeviance`: return the deviance of the linear model which includes the intercept only - `nullloglikelihood`: return the log-likelihood of the null model corresponding to the fitted linear model - `predict` : obtain predicted values of the dependent variable from the fitted model - `r2`: R² of a linear model (an alias for `r²`) From 6f9dc791696b8cd6302a510294948838b5921d09 Mon Sep 17 00:00:00 2001 From: mousum-github <44145580+mousum-github@users.noreply.github.com> Date: Mon, 23 May 2022 07:29:55 +0530 Subject: [PATCH 07/23] Update docs/src/index.md removed the definition of `bic`. Co-authored-by: Dave Kleinschmidt --- docs/src/index.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/src/index.md b/docs/src/index.md index 3610d85d..ec75396e 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -150,8 +150,7 @@ Many of the methods provided by this package have names similar to those in [R]( - `adjr2`: adjusted R² for a linear model (an alias for `adjr²`) - `aic`: Akaike's Information Criterion, defined as ``-2 \\log L + 2k``, with ``L`` the likelihood of the model, and `k` it the number of consumed degrees of freedom - `aicc`: corrected Akaike's Information Criterion for small sample sizes (Hurvich and Tsai 1989) -- `bic`: Bayesian Information Criterion, defined as ``-2 \\log L + k \\log n``, with ``L`` -the likelihood of the model, ``k`` is the number of consumed degrees of freedom +- `bic`: Bayesian Information Criterion - `coef`: extract the estimates of the coefficients in the model - `confint`: compute confidence intervals for coefficients, with confidence level `level` (by default 95%) - `cooksdistance`: compute [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation in linear model `obj`, giving an estimate of the influence of each data point. Currently only implemented for linear models without weights. From 2f540624d5cec53d26aae11c5d876a992c1bb9bc Mon Sep 17 00:00:00 2001 From: mousum-github <44145580+mousum-github@users.noreply.github.com> Date: Mon, 23 May 2022 07:30:19 +0530 Subject: [PATCH 08/23] Update docs/src/index.md removed the definition of `dof`. Co-authored-by: Dave Kleinschmidt --- docs/src/index.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/src/index.md b/docs/src/index.md index ec75396e..06826813 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -156,8 +156,7 @@ Many of the methods provided by this package have names similar to those in [R]( - `cooksdistance`: compute [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation in linear model `obj`, giving an estimate of the influence of each data point. Currently only implemented for linear models without weights. - `deviance`: measure of the model fit, weighted residual sum of squares for lm's - `dispersion`: return the estimated dispersion (or scale) parameter for a model's distribution -- `dof`: return the number of degrees of freedom consumed in the model, including -when applicable the intercept and the distribution's dispersion parameter +- `dof`: return the number of degrees of freedom consumed in the model - `dof_residual`: degrees of freedom for residuals, when meaningful - `fitted`: return the fitted values of the model - `glm`: fit a generalized linear model (an alias for `fit(GeneralizedLinearModel, ...)`) From 86ffe001e13bd8de7ee1e48c7ac5f8a935707cac Mon Sep 17 00:00:00 2001 From: mousum-github <44145580+mousum-github@users.noreply.github.com> Date: Mon, 23 May 2022 07:33:51 +0530 Subject: [PATCH 09/23] Update docs/src/index.md Co-authored-by: Milan Bouchet-Valat --- docs/src/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/index.md b/docs/src/index.md index 06826813..880f6b18 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -165,7 +165,7 @@ Many of the methods provided by this package have names similar to those in [R]( - `modelmatrix`: return the design matrix - `nobs`: return the number of rows, or sum of the weights when prior weights are specified - `nulldeviance`: return the deviance of the linear model which includes the intercept only -- `nullloglikelihood`: return the log-likelihood of the null model corresponding to the fitted linear model +- `nullloglikelihood`: return the log-likelihood of the linear model which includes the intercept only - `predict` : obtain predicted values of the dependent variable from the fitted model - `r2`: R² of a linear model (an alias for `r²`) - `residuals`: get the vector of residuals from the fitted model From f7c2e3a59798cdbd8ef0e928ba0dbe61c5988b65 Mon Sep 17 00:00:00 2001 From: mousum-github <44145580+mousum-github@users.noreply.github.com> Date: Mon, 23 May 2022 07:35:15 +0530 Subject: [PATCH 10/23] Update docs/src/index.md removed the definition of `aic` from here. Co-authored-by: Milan Bouchet-Valat --- docs/src/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/index.md b/docs/src/index.md index 880f6b18..0502e520 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -148,7 +148,7 @@ F-test: 2 models fitted on 50 observations Many of the methods provided by this package have names similar to those in [R](http://www.r-project.org). - `adjr2`: adjusted R² for a linear model (an alias for `adjr²`) -- `aic`: Akaike's Information Criterion, defined as ``-2 \\log L + 2k``, with ``L`` the likelihood of the model, and `k` it the number of consumed degrees of freedom +- `aic`: Akaike's Information Criterion - `aicc`: corrected Akaike's Information Criterion for small sample sizes (Hurvich and Tsai 1989) - `bic`: Bayesian Information Criterion - `coef`: extract the estimates of the coefficients in the model From ad8654b0fef4b086d2422bb93d3a549caf6fe214 Mon Sep 17 00:00:00 2001 From: mousum-github <44145580+mousum-github@users.noreply.github.com> Date: Mon, 23 May 2022 07:35:29 +0530 Subject: [PATCH 11/23] Update docs/src/index.md Co-authored-by: Milan Bouchet-Valat --- docs/src/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/index.md b/docs/src/index.md index 0502e520..79628d8d 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -149,7 +149,7 @@ F-test: 2 models fitted on 50 observations Many of the methods provided by this package have names similar to those in [R](http://www.r-project.org). - `adjr2`: adjusted R² for a linear model (an alias for `adjr²`) - `aic`: Akaike's Information Criterion -- `aicc`: corrected Akaike's Information Criterion for small sample sizes (Hurvich and Tsai 1989) +- `aicc`: corrected Akaike's Information Criterion for small sample sizes - `bic`: Bayesian Information Criterion - `coef`: extract the estimates of the coefficients in the model - `confint`: compute confidence intervals for coefficients, with confidence level `level` (by default 95%) From 23c8d8a7035376d5e306f3757768be1de8fde6f3 Mon Sep 17 00:00:00 2001 From: Alex Arslan Date: Mon, 23 May 2022 17:22:31 -0700 Subject: [PATCH 12/23] Remove "return" as requested --- docs/src/index.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/docs/src/index.md b/docs/src/index.md index 79628d8d..58fa9979 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -147,29 +147,29 @@ F-test: 2 models fitted on 50 observations ## Methods applied to fitted models Many of the methods provided by this package have names similar to those in [R](http://www.r-project.org). -- `adjr2`: adjusted R² for a linear model (an alias for `adjr²`) +- `adjr2`: adjusted R² for a linear model (an alias for `adjr²`) - `aic`: Akaike's Information Criterion - `aicc`: corrected Akaike's Information Criterion for small sample sizes - `bic`: Bayesian Information Criterion -- `coef`: extract the estimates of the coefficients in the model -- `confint`: compute confidence intervals for coefficients, with confidence level `level` (by default 95%) -- `cooksdistance`: compute [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation in linear model `obj`, giving an estimate of the influence of each data point. Currently only implemented for linear models without weights. +- `coef`: estimates of the coefficients in the model +- `confint`: confidence intervals for coefficients +- `cooksdistance`: [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation, giving an estimate of the influence of each data point. Currently only implemented for linear models without weights. - `deviance`: measure of the model fit, weighted residual sum of squares for lm's -- `dispersion`: return the estimated dispersion (or scale) parameter for a model's distribution +- `dispersion`: estimated dispersion (or scale) parameter for a model's distribution - `dof`: return the number of degrees of freedom consumed in the model - `dof_residual`: degrees of freedom for residuals, when meaningful - `fitted`: return the fitted values of the model - `glm`: fit a generalized linear model (an alias for `fit(GeneralizedLinearModel, ...)`) - `lm`: fit a linear model (an alias for `fit(LinearModel, ...)`) -- `loglikelihood`: return the log-likelihood of the model -- `modelmatrix`: return the design matrix -- `nobs`: return the number of rows, or sum of the weights when prior weights are specified -- `nulldeviance`: return the deviance of the linear model which includes the intercept only -- `nullloglikelihood`: return the log-likelihood of the linear model which includes the intercept only -- `predict` : obtain predicted values of the dependent variable from the fitted model +- `loglikelihood`: log-likelihood of the model +- `modelmatrix`: design matrix +- `nobs`: number of rows, or sum of the weights when prior weights are specified +- `nulldeviance`: deviance of the linear model which includes the intercept only +- `nullloglikelihood`: log-likelihood of the linear model which includes the intercept only +- `predict`: obtain predicted values of the dependent variable from the fitted model - `r2`: R² of a linear model (an alias for `r²`) -- `residuals`: get the vector of residuals from the fitted model -- `response`: return the model response (a.k.a the dependent variable) +- `residuals`: vector of residuals from the fitted model +- `response`: model response (a.k.a the dependent variable) - `stderror`: standard errors of the coefficients - `vcov`: estimated variance-covariance matrix of the coefficient estimates From 7c8621a639f661abd742ecbfac3747011424e5aa Mon Sep 17 00:00:00 2001 From: Mousum Date: Fri, 17 Jun 2022 15:59:05 +0530 Subject: [PATCH 13/23] updated the code based on suggestions --- docs/src/examples.md | 43 +++++++++++++++++++++++++++++++ docs/src/index.md | 61 ++++---------------------------------------- 2 files changed, 48 insertions(+), 56 deletions(-) diff --git a/docs/src/examples.md b/docs/src/examples.md index 689a12ca..b943f582 100644 --- a/docs/src/examples.md +++ b/docs/src/examples.md @@ -42,6 +42,49 @@ julia> round.(predict(ols), digits=5) 1.83333 4.33333 6.83333 + +julia> round.(confint(ols); digits=5) +2×2 Matrix{Float64}: + -8.59038 7.25704 + -1.16797 6.16797 + +julia> round(r2(ols); digits=5) +0.98684 + +julia> round(adjr2(ols); digits=5) +0.97368 + +julia> round(deviance(ols); digits=5) +0.16667 + +julia> dof(ols) +3 + +julia> dof_residual(ols) +1.0 + +julia> round(aic(ols); digits=5) +5.84252 + +julia> round(aicc(ols); digits=5) +-18.15748 + +julia> round(bic(ols); digits=5) +3.13835 + +julia> round(dispersion(ols.model); digits=5) +0.40825 + +julia> round(loglikelihood(ols); digits=5) +0.07874 + +julia> round(nullloglikelihood(ols); digits=5) +-6.41736 + +julia> round.(vcov(ols); digits=5) +2×2 Matrix{Float64}: + 0.38889 -0.16667 + -0.16667 0.08333 ``` ## Probit regression diff --git a/docs/src/index.md b/docs/src/index.md index 58fa9979..1c1353bc 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -155,10 +155,10 @@ Many of the methods provided by this package have names similar to those in [R]( - `confint`: confidence intervals for coefficients - `cooksdistance`: [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation, giving an estimate of the influence of each data point. Currently only implemented for linear models without weights. - `deviance`: measure of the model fit, weighted residual sum of squares for lm's -- `dispersion`: estimated dispersion (or scale) parameter for a model's distribution -- `dof`: return the number of degrees of freedom consumed in the model +- `dispersion`: dispersion (or scale) parameter for a model's distribution +- `dof`: the number of degrees of freedom consumed in the model - `dof_residual`: degrees of freedom for residuals, when meaningful -- `fitted`: return the fitted values of the model +- `fitted`: the fitted values of the model - `glm`: fit a generalized linear model (an alias for `fit(GeneralizedLinearModel, ...)`) - `lm`: fit a linear model (an alias for `fit(LinearModel, ...)`) - `loglikelihood`: log-likelihood of the model @@ -171,7 +171,7 @@ Many of the methods provided by this package have names similar to those in [R]( - `residuals`: vector of residuals from the fitted model - `response`: model response (a.k.a the dependent variable) - `stderror`: standard errors of the coefficients -- `vcov`: estimated variance-covariance matrix of the coefficient estimates +- `vcov`: variance-covariance matrix of the coefficient estimates Note that the canonical link for negative binomial regression is `NegativeBinomialLink`, but @@ -190,68 +190,17 @@ julia> round.(coef(mdl); digits=8) 2-element Vector{Float64}: -0.66666667 2.5 - -julia> round.(stderror(mdl); digits=8) -2-element Vector{Float64}: - 0.62360956 - 0.28867513 - -julia> round.(confint(mdl); digits=8) -2×2 Matrix{Float64}: - -8.59038 7.25704 - -1.16797 6.16797 - + julia> round(r2(mdl); digits=8) 0.98684211 -julia> round(adjr2(mdl); digits=8) -0.97368421 - -julia> round(deviance(mdl); digits=8) -0.16666667 - -julia> dof(mdl) -3 - -julia> dof_residual(mdl) -1.0 - julia> round(aic(mdl); digits=8) 5.84251593 - -julia> round(aicc(mdl); digits=8) --18.15748407 - -julia> round(bic(mdl); digits=8) -3.13835279 - -julia> round(dispersion(mdl.model); digits=8) -0.40824829 - -julia> round(loglikelihood(mdl); digits=8) -0.07874204 - -julia> round(nullloglikelihood(mdl); digits=8) --6.41735797 - -julia> round.(vcov(mdl); digits=8) -2×2 Matrix{Float64}: - 0.388889 -0.166667 - -0.166667 0.0833333 ``` `predict` method returns predicted values of response variable from covariate values `newX`. If you ommit `newX` then it return fitted response values. You will find more about [predict](https://juliastats.org/GLM.jl/stable/api/#StatsBase.predict) in the API docuemnt. ```jldoctest methods -julia> round.(predict(mdl); digits=8) -3-element Vector{Float64}: - 1.83333333 - 4.33333333 - 6.83333333 - -julia> fitted(mdl) ≈ predict(mdl) -true - julia> round.(predict(mdl, test_data); digits=8) 1-element Vector{Float64}: 9.33333333 From 499ed93ee295cb6fe7cbefc78205c1fa3b6e448f Mon Sep 17 00:00:00 2001 From: mousum-github <44145580+mousum-github@users.noreply.github.com> Date: Sun, 3 Jul 2022 09:52:48 +0530 Subject: [PATCH 14/23] Update docs/src/index.md Co-authored-by: Milan Bouchet-Valat --- docs/src/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/index.md b/docs/src/index.md index 1c1353bc..59eb1910 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -153,7 +153,7 @@ Many of the methods provided by this package have names similar to those in [R]( - `bic`: Bayesian Information Criterion - `coef`: estimates of the coefficients in the model - `confint`: confidence intervals for coefficients -- `cooksdistance`: [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation, giving an estimate of the influence of each data point. Currently only implemented for linear models without weights. +- `cooksdistance`: [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation - `deviance`: measure of the model fit, weighted residual sum of squares for lm's - `dispersion`: dispersion (or scale) parameter for a model's distribution - `dof`: the number of degrees of freedom consumed in the model From f6721f2bdc97a3b3554df66db1241d98de43d5c1 Mon Sep 17 00:00:00 2001 From: mousum-github <44145580+mousum-github@users.noreply.github.com> Date: Sun, 3 Jul 2022 09:53:04 +0530 Subject: [PATCH 15/23] Update docs/src/index.md Co-authored-by: Milan Bouchet-Valat --- docs/src/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/src/index.md b/docs/src/index.md index 59eb1910..94bd03f6 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -164,8 +164,8 @@ Many of the methods provided by this package have names similar to those in [R]( - `loglikelihood`: log-likelihood of the model - `modelmatrix`: design matrix - `nobs`: number of rows, or sum of the weights when prior weights are specified -- `nulldeviance`: deviance of the linear model which includes the intercept only -- `nullloglikelihood`: log-likelihood of the linear model which includes the intercept only +- `nulldeviance`: deviance of the model with all predictors removed +- `nullloglikelihood`: log-likelihood of the model with all predictors removed - `predict`: obtain predicted values of the dependent variable from the fitted model - `r2`: R² of a linear model (an alias for `r²`) - `residuals`: vector of residuals from the fitted model From f786e31e3366728420c66d8906e5b031071db166 Mon Sep 17 00:00:00 2001 From: mousum-github <44145580+mousum-github@users.noreply.github.com> Date: Sun, 3 Jul 2022 09:53:25 +0530 Subject: [PATCH 16/23] Update docs/src/index.md Co-authored-by: Milan Bouchet-Valat --- docs/src/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/index.md b/docs/src/index.md index 94bd03f6..b93232a1 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -156,7 +156,7 @@ Many of the methods provided by this package have names similar to those in [R]( - `cooksdistance`: [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation - `deviance`: measure of the model fit, weighted residual sum of squares for lm's - `dispersion`: dispersion (or scale) parameter for a model's distribution -- `dof`: the number of degrees of freedom consumed in the model +- `dof`: number of degrees of freedom consumed in the model - `dof_residual`: degrees of freedom for residuals, when meaningful - `fitted`: the fitted values of the model - `glm`: fit a generalized linear model (an alias for `fit(GeneralizedLinearModel, ...)`) From 3da9bb8779694b13aa3264bb6ff53e2e551b5b6a Mon Sep 17 00:00:00 2001 From: mousum-github <44145580+mousum-github@users.noreply.github.com> Date: Sun, 3 Jul 2022 09:54:17 +0530 Subject: [PATCH 17/23] Update docs/src/index.md Co-authored-by: Milan Bouchet-Valat --- docs/src/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/index.md b/docs/src/index.md index b93232a1..0a8f4100 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -166,7 +166,7 @@ Many of the methods provided by this package have names similar to those in [R]( - `nobs`: number of rows, or sum of the weights when prior weights are specified - `nulldeviance`: deviance of the model with all predictors removed - `nullloglikelihood`: log-likelihood of the model with all predictors removed -- `predict`: obtain predicted values of the dependent variable from the fitted model +- `predict`: predicted values of the dependent variable from the fitted model - `r2`: R² of a linear model (an alias for `r²`) - `residuals`: vector of residuals from the fitted model - `response`: model response (a.k.a the dependent variable) From 1c528dbf3f8366d3a13edb40e2b4910adb676a5e Mon Sep 17 00:00:00 2001 From: mousum-github <44145580+mousum-github@users.noreply.github.com> Date: Sun, 3 Jul 2022 09:55:40 +0530 Subject: [PATCH 18/23] Update docs/src/index.md Co-authored-by: Milan Bouchet-Valat --- docs/src/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/index.md b/docs/src/index.md index 0a8f4100..f90dc4f4 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -158,7 +158,7 @@ Many of the methods provided by this package have names similar to those in [R]( - `dispersion`: dispersion (or scale) parameter for a model's distribution - `dof`: number of degrees of freedom consumed in the model - `dof_residual`: degrees of freedom for residuals, when meaningful -- `fitted`: the fitted values of the model +- `fitted`: fitted values of the model - `glm`: fit a generalized linear model (an alias for `fit(GeneralizedLinearModel, ...)`) - `lm`: fit a linear model (an alias for `fit(LinearModel, ...)`) - `loglikelihood`: log-likelihood of the model From b4c408f269958a68324d0ebf1eacf5660b1a0f65 Mon Sep 17 00:00:00 2001 From: Mousum Date: Mon, 11 Jul 2022 12:53:51 +0530 Subject: [PATCH 19/23] removed direct exporting AIC, BIC, AICC etc. --- docs/src/index.md | 2 +- src/GLM.jl | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/src/index.md b/docs/src/index.md index f90dc4f4..d819e6cf 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -178,7 +178,7 @@ Note that the canonical link for negative binomial regression is `NegativeBinomi in practice one typically uses `LogLink`. ```jldoctest methods -julia> using GLM, DataFrames; +julia> using GLM, DataFrames, StatsBase; julia> data = DataFrame(X=[1,2,3], y=[2,4,7]); diff --git a/src/GLM.jl b/src/GLM.jl index 71f07b9b..e0ed3a2c 100644 --- a/src/GLM.jl +++ b/src/GLM.jl @@ -20,7 +20,7 @@ module GLM export coef, coeftable, confint, deviance, nulldeviance, dof, dof_residual, loglikelihood, nullloglikelihood, nobs, stderror, vcov, residuals, predict, fitted, fit, fit!, model_response, response, modelmatrix, r2, r², adjr2, adjr², - cooksdistance, hasintercept, aic, aicc, bic, dispersion + cooksdistance, hasintercept export # types From 4e348209ff48fe35aea136b93763a672d2334235 Mon Sep 17 00:00:00 2001 From: Mousum Date: Mon, 11 Jul 2022 13:03:58 +0530 Subject: [PATCH 20/23] added importing `StatsBase` in some example --- docs/src/examples.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/examples.md b/docs/src/examples.md index b943f582..bcee9cc8 100644 --- a/docs/src/examples.md +++ b/docs/src/examples.md @@ -8,7 +8,7 @@ end ## Linear regression ```jldoctest -julia> using DataFrames, GLM +julia> using DataFrames, GLM, StatsBase julia> data = DataFrame(X=[1,2,3], Y=[2,4,7]) 3×2 DataFrame From 611ade9bcbb38e3aab113cad8c73d9e965d0b639 Mon Sep 17 00:00:00 2001 From: Mousum Date: Mon, 11 Jul 2022 13:14:22 +0530 Subject: [PATCH 21/23] removed direct exporting AIC, BIC, AICC etc. --- src/GLM.jl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/GLM.jl b/src/GLM.jl index e0ed3a2c..abcb7c23 100644 --- a/src/GLM.jl +++ b/src/GLM.jl @@ -20,7 +20,7 @@ module GLM export coef, coeftable, confint, deviance, nulldeviance, dof, dof_residual, loglikelihood, nullloglikelihood, nobs, stderror, vcov, residuals, predict, fitted, fit, fit!, model_response, response, modelmatrix, r2, r², adjr2, adjr², - cooksdistance, hasintercept + cooksdistance, hasintercept, dispersion export # types From 4b162c82cedd63bfd5cd30eea82c664140a4d382 Mon Sep 17 00:00:00 2001 From: Alex Arslan Date: Tue, 12 Jul 2022 08:37:06 -0700 Subject: [PATCH 22/23] Fix a couple of typos, move a variable definition Also use at-ref for referencing docstrings. --- docs/src/index.md | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/docs/src/index.md b/docs/src/index.md index d819e6cf..e68165fd 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -178,13 +178,11 @@ Note that the canonical link for negative binomial regression is `NegativeBinomi in practice one typically uses `LogLink`. ```jldoctest methods -julia> using GLM, DataFrames, StatsBase; +julia> using GLM, DataFrames, StatsBase julia> data = DataFrame(X=[1,2,3], y=[2,4,7]); -julia> test_data = DataFrame(X=[4]); - -julia> mdl = lm(@formula(y ~ X), data); +julia> mdl = lm(@formula(y ~ X), data); julia> round.(coef(mdl); digits=8) 2-element Vector{Float64}: @@ -197,15 +195,20 @@ julia> round(r2(mdl); digits=8) julia> round(aic(mdl); digits=8) 5.84251593 ``` -`predict` method returns predicted values of response variable from covariate values `newX`. -If you ommit `newX` then it return fitted response values. You will find more about [predict](https://juliastats.org/GLM.jl/stable/api/#StatsBase.predict) in the API docuemnt. + +The [`predict`](@ref) method returns predicted values of response variable from covariate values in an input `newX`. +If `newX` is omitted then the fitted response values from the model are returned. ```jldoctest methods +julia> test_data = DataFrame(X=[4]); + julia> round.(predict(mdl, test_data); digits=8) 1-element Vector{Float64}: 9.33333333 ``` -`cooksdistance` method computes [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation in linear model `obj`, giving an estimate of the influence of each data point. Currently only implemented for linear models without weights. + +The [`cooksdistance`](@ref) method computes [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation used to fit a linear model, giving an estimate of the influence of each data point. +Note that it's currently only implemented for linear models without weights. ```jldoctest methods julia> round.(cooksdistance(mdl); digits=8) From d7d384dfc0d5c75059907c0d31b890fdc034842f Mon Sep 17 00:00:00 2001 From: Mousum Dutta <44145580+mousum-github@users.noreply.github.com> Date: Wed, 13 Jul 2022 17:31:17 +0530 Subject: [PATCH 23/23] Update src/GLM.jl The suggestion is committed. Co-authored-by: Milan Bouchet-Valat --- src/GLM.jl | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/GLM.jl b/src/GLM.jl index abcb7c23..a2ed68aa 100644 --- a/src/GLM.jl +++ b/src/GLM.jl @@ -12,8 +12,7 @@ module GLM import Statistics: cor import StatsBase: coef, coeftable, confint, deviance, nulldeviance, dof, dof_residual, loglikelihood, nullloglikelihood, nobs, stderror, vcov, residuals, predict, - fitted, fit, model_response, response, modelmatrix, r2, r², adjr2, adjr², PValue, - aic, aicc, bic + fitted, fit, model_response, response, modelmatrix, r2, r², adjr2, adjr², PValue import StatsFuns: xlogy import SpecialFunctions: erfc, erfcinv, digamma, trigamma import StatsModels: hasintercept