diff --git a/docs/src/examples.md b/docs/src/examples.md index 689a12ca..bcee9cc8 100644 --- a/docs/src/examples.md +++ b/docs/src/examples.md @@ -8,7 +8,7 @@ end ## Linear regression ```jldoctest -julia> using DataFrames, GLM +julia> using DataFrames, GLM, StatsBase julia> data = DataFrame(X=[1,2,3], Y=[2,4,7]) 3×2 DataFrame @@ -42,6 +42,49 @@ julia> round.(predict(ols), digits=5) 1.83333 4.33333 6.83333 + +julia> round.(confint(ols); digits=5) +2×2 Matrix{Float64}: + -8.59038 7.25704 + -1.16797 6.16797 + +julia> round(r2(ols); digits=5) +0.98684 + +julia> round(adjr2(ols); digits=5) +0.97368 + +julia> round(deviance(ols); digits=5) +0.16667 + +julia> dof(ols) +3 + +julia> dof_residual(ols) +1.0 + +julia> round(aic(ols); digits=5) +5.84252 + +julia> round(aicc(ols); digits=5) +-18.15748 + +julia> round(bic(ols); digits=5) +3.13835 + +julia> round(dispersion(ols.model); digits=5) +0.40825 + +julia> round(loglikelihood(ols); digits=5) +0.07874 + +julia> round(nullloglikelihood(ols); digits=5) +-6.41736 + +julia> round.(vcov(ols); digits=5) +2×2 Matrix{Float64}: + 0.38889 -0.16667 + -0.16667 0.08333 ``` ## Probit regression diff --git a/docs/src/index.md b/docs/src/index.md index d21d69ee..e68165fd 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -35,7 +35,7 @@ functions are Binomial (LogitLink) Gamma (InverseLink) InverseGaussian (InverseSquareLink) - NegativeBinomial (LogLink) + NegativeBinomial (NegativeBinomialLink, often used with LogLink) Normal (IdentityLink) Poisson (LogLink) @@ -147,20 +147,77 @@ F-test: 2 models fitted on 50 observations ## Methods applied to fitted models Many of the methods provided by this package have names similar to those in [R](http://www.r-project.org). -- `coef`: extract the estimates of the coefficients in the model +- `adjr2`: adjusted R² for a linear model (an alias for `adjr²`) +- `aic`: Akaike's Information Criterion +- `aicc`: corrected Akaike's Information Criterion for small sample sizes +- `bic`: Bayesian Information Criterion +- `coef`: estimates of the coefficients in the model +- `confint`: confidence intervals for coefficients +- `cooksdistance`: [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation - `deviance`: measure of the model fit, weighted residual sum of squares for lm's +- `dispersion`: dispersion (or scale) parameter for a model's distribution +- `dof`: number of degrees of freedom consumed in the model - `dof_residual`: degrees of freedom for residuals, when meaningful +- `fitted`: fitted values of the model - `glm`: fit a generalized linear model (an alias for `fit(GeneralizedLinearModel, ...)`) - `lm`: fit a linear model (an alias for `fit(LinearModel, ...)`) -- `r2`: R² of a linear model or pseudo-R² of a generalized linear model +- `loglikelihood`: log-likelihood of the model +- `modelmatrix`: design matrix +- `nobs`: number of rows, or sum of the weights when prior weights are specified +- `nulldeviance`: deviance of the model with all predictors removed +- `nullloglikelihood`: log-likelihood of the model with all predictors removed +- `predict`: predicted values of the dependent variable from the fitted model +- `r2`: R² of a linear model (an alias for `r²`) +- `residuals`: vector of residuals from the fitted model +- `response`: model response (a.k.a the dependent variable) - `stderror`: standard errors of the coefficients -- `vcov`: estimated variance-covariance matrix of the coefficient estimates -- `predict` : obtain predicted values of the dependent variable from the fitted model -- `residuals`: get the vector of residuals from the fitted model +- `vcov`: variance-covariance matrix of the coefficient estimates + Note that the canonical link for negative binomial regression is `NegativeBinomialLink`, but in practice one typically uses `LogLink`. +```jldoctest methods +julia> using GLM, DataFrames, StatsBase + +julia> data = DataFrame(X=[1,2,3], y=[2,4,7]); + +julia> mdl = lm(@formula(y ~ X), data); + +julia> round.(coef(mdl); digits=8) +2-element Vector{Float64}: + -0.66666667 + 2.5 + +julia> round(r2(mdl); digits=8) +0.98684211 + +julia> round(aic(mdl); digits=8) +5.84251593 +``` + +The [`predict`](@ref) method returns predicted values of response variable from covariate values in an input `newX`. +If `newX` is omitted then the fitted response values from the model are returned. + +```jldoctest methods +julia> test_data = DataFrame(X=[4]); + +julia> round.(predict(mdl, test_data); digits=8) +1-element Vector{Float64}: + 9.33333333 +``` + +The [`cooksdistance`](@ref) method computes [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation used to fit a linear model, giving an estimate of the influence of each data point. +Note that it's currently only implemented for linear models without weights. + +```jldoctest methods +julia> round.(cooksdistance(mdl); digits=8) +3-element Vector{Float64}: + 2.5 + 0.25 + 2.5 +``` + ## Separation of response object and predictor object The general approach in this code is to separate functionality related diff --git a/src/GLM.jl b/src/GLM.jl index 3f7d10b3..a2ed68aa 100644 --- a/src/GLM.jl +++ b/src/GLM.jl @@ -19,7 +19,7 @@ module GLM export coef, coeftable, confint, deviance, nulldeviance, dof, dof_residual, loglikelihood, nullloglikelihood, nobs, stderror, vcov, residuals, predict, fitted, fit, fit!, model_response, response, modelmatrix, r2, r², adjr2, adjr², - cooksdistance, hasintercept + cooksdistance, hasintercept, dispersion export # types