Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readability improvement #482

Merged
merged 23 commits into from
Jul 13, 2022
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 111 additions & 4 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ functions are
Binomial (LogitLink)
Gamma (InverseLink)
InverseGaussian (InverseSquareLink)
NegativeBinomial (LogLink)
NegativeBinomial (NegativeBinomialLink, often used with LogLink)
Normal (IdentityLink)
Poisson (LogLink)

Expand Down Expand Up @@ -147,20 +147,127 @@ F-test: 2 models fitted on 50 observations
## Methods applied to fitted models

Many of the methods provided by this package have names similar to those in [R](http://www.r-project.org).
- `adjr2`: adjusted R² for a linear model (an alias for `adjr²`)
- `aic`: Akaike's Information Criterion, defined as ``-2 \\log L + 2k``, with ``L`` the likelihood of the model, and `k` it the number of consumed degrees of freedom
mousum-github marked this conversation as resolved.
Show resolved Hide resolved
- `aicc`: corrected Akaike's Information Criterion for small sample sizes (Hurvich and Tsai 1989)
mousum-github marked this conversation as resolved.
Show resolved Hide resolved
- `bic`: Bayesian Information Criterion, defined as ``-2 \\log L + k \\log n``, with ``L``
the likelihood of the model, ``k`` is the number of consumed degrees of freedom
mousum-github marked this conversation as resolved.
Show resolved Hide resolved
- `coef`: extract the estimates of the coefficients in the model
- `confint`: compute confidence intervals for coefficients, with confidence level `level` (by default 95%)
mousum-github marked this conversation as resolved.
Show resolved Hide resolved
- `cooksdistance`: compute [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation in linear model `obj`, giving an estimate of the influence of each data point. Currently only implemented for linear models without weights.
- `deviance`: measure of the model fit, weighted residual sum of squares for lm's
- `dispersion`: return the estimated dispersion (or scale) parameter for a model's distribution
mousum-github marked this conversation as resolved.
Show resolved Hide resolved
- `dof`: return the number of degrees of freedom consumed in the model, including
when applicable the intercept and the distribution's dispersion parameter
mousum-github marked this conversation as resolved.
Show resolved Hide resolved
- `dof_residual`: degrees of freedom for residuals, when meaningful
- `fitted`: return the fitted values of the model
- `glm`: fit a generalized linear model (an alias for `fit(GeneralizedLinearModel, ...)`)
- `lm`: fit a linear model (an alias for `fit(LinearModel, ...)`)
- `r2`: R² of a linear model or pseudo-R² of a generalized linear model
- `stderror`: standard errors of the coefficients
- `vcov`: estimated variance-covariance matrix of the coefficient estimates
- `loglikelihood`: return the log-likelihood of the model
- `modelmatrix`: return the design matrix
- `nobs`: return the number of rows, or sum of the weights when prior weights are specified
- `nulldeviance`: return the deviance of the linear model which includs the intercept only
mousum-github marked this conversation as resolved.
Show resolved Hide resolved
- `nullloglikelihood`: return the log-likelihood of the null model corresponding to the fitted linear model
mousum-github marked this conversation as resolved.
Show resolved Hide resolved
- `predict` : obtain predicted values of the dependent variable from the fitted model
- `r2`: R² of a linear model (an alias for `r²`)
- `residuals`: get the vector of residuals from the fitted model
- `response`: return the model response (a.k.a the dependent variable)
- `stderror`: standard errors of the coefficients
- `vcov`: estimated variance-covariance matrix of the coefficient estimates
mousum-github marked this conversation as resolved.
Show resolved Hide resolved


Note that the canonical link for negative binomial regression is `NegativeBinomialLink`, but
in practice one typically uses `LogLink`.

```jldoctest methods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a long example to show on the home page. Also given how simple most of these functions are I'm not sure it's super useful to show all of them. How about adding some of these to the "Linear Model" example section instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except coef, r2, aic and prediction, others are moved to the Linear Regression example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather move everything there to keep the home page simple. Actually we should probably also rework existing examples as it's not super logical to illustrate passing contrasts before even showing how to fit a model... We could move contents to other pages and improve them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we start a different PR for something like Reorganise GLM documentation or continue updating this PR only?
My thought of keeping r2, aic and prediction along with fitting a model at the beginning is, these functionality most of the linear models consumers are looking for.
Looking for your thought.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it's reasonable to go ahead adding the example here and reorganize documentation separately.

julia> using GLM, DataFrames;

julia> data = DataFrame(X=[1,2,3], y=[2,4,7]);

julia> test_data = DataFrame(X=[4]);

julia> mdl = lm(@formula(y ~ X), data);

julia> round.(coef(mdl); digits=8)
2-element Vector{Float64}:
-0.66666667
2.5

julia> round.(stderror(mdl); digits=8)
2-element Vector{Float64}:
0.62360956
0.28867513

julia> round.(confint(mdl); digits=8)
2×2 Matrix{Float64}:
-8.59038 7.25704
-1.16797 6.16797

julia> round(r2(mdl); digits=8)
0.98684211

julia> round(adjr2(mdl); digits=8)
0.97368421

julia> round(deviance(mdl); digits=8)
0.16666667

julia> dof(mdl)
3

julia> dof_residual(mdl)
1.0

julia> round(aic(mdl); digits=8)
5.84251593

julia> round(aicc(mdl); digits=8)
-18.15748407

julia> round(bic(mdl); digits=8)
3.13835279

julia> round(dispersion(mdl.model); digits=8)
0.40824829

julia> round(loglikelihood(mdl); digits=8)
0.07874204

julia> round(nullloglikelihood(mdl); digits=8)
-6.41735797

julia> round.(vcov(mdl); digits=8)
2×2 Matrix{Float64}:
0.388889 -0.166667
-0.166667 0.0833333
```
`predict` method returns predicted values of response variable from covariate values `newX`.
If you ommit `newX` then it return fitted response values. You will find more about [predict](https://juliastats.org/GLM.jl/stable/api/#StatsBase.predict) in the API docuemnt.

```jldoctest methods
julia> round.(predict(mdl); digits=8)
3-element Vector{Float64}:
1.83333333
4.33333333
6.83333333

julia> fitted(mdl) ≈ predict(mdl)
true

julia> round.(predict(mdl, test_data); digits=8)
1-element Vector{Float64}:
9.33333333
```
`cooksdistance` method computes [Cook's distance](https://en.wikipedia.org/wiki/Cook%27s_distance) for each observation in linear model `obj`, giving an estimate of the influence of each data point. Currently only implemented for linear models without weights.

```jldoctest methods
julia> round.(cooksdistance(mdl); digits=8)
3-element Vector{Float64}:
2.5
0.25
2.5
```

## Separation of response object and predictor object

The general approach in this code is to separate functionality related
Expand Down
5 changes: 3 additions & 2 deletions src/GLM.jl
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,15 @@ module GLM
import Statistics: cor
import StatsBase: coef, coeftable, confint, deviance, nulldeviance, dof, dof_residual,
loglikelihood, nullloglikelihood, nobs, stderror, vcov, residuals, predict,
fitted, fit, model_response, response, modelmatrix, r2, r², adjr2, adjr², PValue
fitted, fit, model_response, response, modelmatrix, r2, r², adjr2, adjr², PValue,
aic, aicc, bic
mousum-github marked this conversation as resolved.
Show resolved Hide resolved
import StatsFuns: xlogy
import SpecialFunctions: erfc, erfcinv, digamma, trigamma
import StatsModels: hasintercept
export coef, coeftable, confint, deviance, nulldeviance, dof, dof_residual,
loglikelihood, nullloglikelihood, nobs, stderror, vcov, residuals, predict,
fitted, fit, fit!, model_response, response, modelmatrix, r2, r², adjr2, adjr²,
cooksdistance, hasintercept
cooksdistance, hasintercept, aic, aicc, bic, dispersion
mousum-github marked this conversation as resolved.
Show resolved Hide resolved

export
# types
Expand Down