Skip to content

Commit ad23c42

Browse files
committed
Updated docs with examples. Renamed unsed util functions. Bumped version.
1 parent 89549e0 commit ad23c42

15 files changed

+145
-31
lines changed

README.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ Some users have issues installing `pymer4` on recent versions of macOS. This is
5454
export CC= pathYouCopiedInQuotes
5555
export CFLAGS="-W"
5656
```
57-
7. Finally install `rpy2` using the new compiler you just installed: `pip install rpy2==2.8.6`
57+
7. Finally install `rpy2` using the new compiler you just installed: `pip install rpy2==2.8.5`
5858
8. Now you should be able to `pip install pymer4`:)
5959
6060
#### Change-log
@@ -64,6 +64,7 @@ Some users have issues installing `pymer4` on recent versions of macOS. This is
6464
- Test statistic inference for `Lmer` models can now be performed via non-parametric permutation tests that shuffle observations within clusters
6565
- `Lmer.fit(factors={})` arguments now support custom arbitrary contrasts
6666
- New forest plots for visualizing model estimates and confidence intervals via the `Lmer.plot_summary()` method
67+
- More comprehensive documentation with examples of new features
6768
6869
**0.4.0**
6970
- Added `post_hoc` tests to `Lmer` models

docs/categorical.rst

+22-5
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
Categorical Predictors
22
=======================
3-
43
The syntax for handling categorical predictors is **different** between standard regression models and multi-level models in :code:`pymer4`. This is because formula parsing is passed to R for multi-level models, but handled by Python for standard regression models. Please note the differences below.
54

65
Standard Regression Models
@@ -33,7 +32,7 @@ Multi-level Models
3332

3433
1. Dummy-coded factor levels (treatment contrasts) in which each model term is the difference between a factor level and a selected reference level
3534
2. Orthogonal polynomial contrasts in which each model term is a polynomial contrast across factor levels (e.g. linear, quadratic, cubic, etc)
36-
3. Custom contrasts for each level of a factor, which should be provide in the manner by R.
35+
3. Custom contrasts for each level of a factor, which should be provide in the manner expected by R.
3736

3837
To make re-parameterizing models easier, factor codings are passed as an argument to a model's :code:`fit` method. This obviates the need for adjusting data-frame properties as in R.
3938

@@ -73,13 +72,26 @@ This will treat the order of list items as the order of factor levels for the *l
7372
ordered = True
7473
})
7574
76-
R-flavored custom contrasts
75+
Custom contrasts
7776
---------------------------
78-
:code:`Lmer()` models can also take custom factor settings based on how they are expected by R. This *differs* from typical user intuition because the :code:`contrasts()` command in R doesn't actually expect contrast weights (i.e. a design matrix) is one would intuit. Rather, it is made for generating contrast coding schemes which are the inverse of the contrast weight matrix. For more on this specification see `this reference <https://rstudio-pubs-static.s3.amazonaws.com/65059_586f394d8eb84f84b1baaf56ffb6b47f.html>`_ and `this reference <https://github.com/ejolly/R/blob/master/Guides/Contrasts_in_R.md>`_. If this is confusing see below for an alternate method.
77+
:code:`Lmer()` models can also take custom factor contrasts based on how they are expected by R. Remember that there can be at most k-1 model terms representing any k level factor without over-parameterizing a model. If all that's desired is a specific contrast between factor levels (e.g. rather than all possible k-1 contrasts) you have 2 options:
78+
79+
1) Specify a custom contrast using the syntax below (also available in the :code:`model.fit()` method help). This will guarantee that one of your model terms reflects your desired contrast, and R will generate set of orthogonal contrasts for the rest of your model terms.
80+
81+
**Example**
82+
Compare level '1.0' in 'IV3' to the mean of levels '0.5' and '1.5'
83+
84+
.. code-block:: python
85+
86+
model.fit(factors={
87+
'IV3': {'1.0': 1, '0.5': -.5, '1.5': -.5}
88+
})
89+
90+
2) Fit a model with *only* the desired contrast rather than a full set of k-1 contrasts. Contrary to how statistics is usually taught, you don't ever *have to* include a full set of k-1 contrasts for a k level factor! Follow the directions below to do this (section: "simpler" custom contrasts). The upside is you won't need to rely on R to compute anything for you (aside from the model fit), and you will have a model with exactly the number of terms as contrasts you desire giving you complete control. The downside is that :code:`model.post_hoc()` methods will not be able to do pairwise comparisons as they rely on R's internal representation of factor levels.
7991

8092
(Simpler) Custom contrasts
8193
--------------------------
82-
Unlike the methods above, testing specific parameterizations without relying on factor coding is often easier done by creating new columns in a dataframe with specific coding schemes. These new columns can be utilized within models to test specific hypotheses. *Note: this is also a useful approach if you don't want to use patsy's formula langauge with standard regression models as noted above*.
94+
Testing specific parameterizations without relying on R's factor coding is often easier done by creating new columns in a dataframe with specific coding schemes. These new columns can be utilized within models to test specific hypotheses. However, the downside of this approach is that the :code:`model.post_hoc()` method will no longer be able estimate simple-effects because it will not be able to group factor levels automatically. *Note: this is also a useful approach if you don't want to use patsy's formula langauge with standard regression models as noted above*.
8395

8496
This is trivial using pandas map and assign methods. Here we'll only build a linear contrast across factor levels (0.5 < 1.0 < 1.5), without all exhaustive higher level polynomial terms:
8597

@@ -101,3 +113,8 @@ Now we can estimate this model without the need to use the :code:`factor` argume
101113
102114
model = Lmer('DV ~ IV3_custom_lin + (IV3_custom_lin|Group)', data=df)
103115
model.fit()
116+
117+
How contrasts in R work
118+
-----------------------
119+
*This is just for folks curious about how contrasts in R work*.
120+
Specifying multiple custom contrasts in R has always been a point of confusion amongst users. This because the :code:`contrasts()` command in R doesn't actually expect contrast weights (i.e. a design matrix) as one would intuit. Rather, it is made for generating contrast coding schemes which are the inverse of the contrast weight matrix. For a longer explanation with examples see `this reference <https://rstudio-pubs-static.s3.amazonaws.com/65059_586f394d8eb84f84b1baaf56ffb6b47f.html>`_ and `this reference <https://github.com/ejolly/R/blob/master/Guides/Contrasts_in_R.md>`_. For these situations pymer4 offers a few utility functions to convert between these matrix types if desired in :code:`pymer4.utils`: :code:`R2con` and :code:`con2R`.

docs/index.rst

+14-3
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
66
Pymer4
77
======
8-
Love multi-level-modeling using `lme4 <https://cran.r-project.org/web/packages/lme4/index.html>`_ in R, but prefer to work in the scientific Python ecosystem? This package has got you covered! It's a small convenience package wrapping the basic functionality of lme4 for compatibility with python. (*Currently this only include linear and logistic multi-level models*)
8+
Love multi-level-modeling using `lme4 <https://cran.r-project.org/web/packages/lme4/index.html>`_ in R, but prefer to work in the scientific Python ecosystem? This package has got you covered! It's a small convenience package wrapping the basic functionality of lme4 for compatibility with python.
99

1010
This package's main purpose is to provide a clean interface that hides the back-and-forth code required when moving between R and Python. In other words a user can work completely in Python, never having to deal with R, but get (most) of lme4's goodness. Behind the scenes this package simply uses `rpy2 <https://rpy2.readthedocs.io/en/version_2.8.x/>`_ to pass objects between languages, compute what's needed, parse everything, and convert to Python types (e.g. numpy arrays, pandas dataframes, etc).
1111

@@ -29,18 +29,22 @@ This package has some extra goodies to make life a bit easier, namely:
2929
- Easy post-hoc tests with multiple-comparisons correction via `lsmeans <https://cran.r-project.org/web/packages/lsmeans/index.html>`_
3030
- Easy model predictions on new data
3131
- Easy generation of new data from a fitted model
32+
- Optional p-value computation via within cluster permutation testing (experimental)
3233

3334
- For standard linear models (i.e. :code:`lm()`)
3435

3536
- Automatic inclusion of confidence intervals in model output
3637
- Easy computation of empirically bootstrapped 95% confidence intervals
37-
- Easy computation of cluster-robust, heteroscedasticity-robust or auto-correlation-robust,'sandwich estimators' for standard errors (*note: these are not the same as auto-regressive models*)
38+
- Easy computation of cluster-robust, heteroscedasticity-robust or auto-correlation-robust, 'sandwich estimators' for standard errors (*note: these are not the same as auto-regressive models*)
3839
- Permutation tests on model parameters
3940

4041
- Data simulation
4142

4243
- Highly customizable functions for generating data useful for standard regression models and multi-level models
4344

45+
- Data visualization
46+
47+
- Convenience methods for plotting model estimates, including random-effects terms
4448

4549
Installation
4650
------------
@@ -84,7 +88,7 @@ Some users have issues installing ``pymer4`` on recent versions of macOS. This i
8488
export CC= pathYouCopiedInQuotes
8589
export CFLAGS="-W"
8690
87-
7. Finally install ``rpy2`` using the new compiler you just installed: ``pip install rpy2==2.8.6``
91+
7. Finally install ``rpy2`` using the new compiler you just installed: ``pip install rpy2==2.8.5``
8892
8. Now you should be able to ``pip install pymer4``:)
8993

9094
Basic Usage Guide
@@ -101,6 +105,13 @@ Categorical Predictors
101105

102106
categorical
103107

108+
Post-hoc Comparisons
109+
--------------------
110+
.. toctree::
111+
:maxdepth: 2
112+
113+
post_hoc
114+
104115
Simulating Data
105116
---------------
106117
.. toctree::

docs/post_hoc.rst

+50-2
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,58 @@
11
ANOVA tables and post-hoc comparisons
22
=====================================
3-
43
ANOVA tables and orthogonal contrasts
54
-------------------------------------
65
Because ANOVA is just regression, :code:`pymer4` makes it easy to estimate ANOVA tables with F-results using the :code:`.anova()` method on a fitted model. By default this will compute a Type-III SS table given the coding scheme provided when the model was initially fit. Based on the distribution of data across factor levels and the specific factor-coding used, this may produce invalid Type-III SS computations. For this reason the :code:`.anova()` method has a :code:`force-orthogonal=True` option that will reparameterize and refit the model using orthogonal polynomial contrasts prior to computing an ANOVA table.
76

87
Marginal estimates and post-hoc comparisons
98
-------------------------------------------
10-
:code:`pymer4` leverages the :code:`lsmeans` package in order to compute marginal estimates and pair-wise comparisons of models that contain categorical terms and/or interactions. This can be performed by using the :code:`.post_hoc()` method on fitted models with various examples in the method help. Currently post-hoc comparisons are not possible from :code:`Lm()` models, only from :code:`Lmer()` models.
9+
:code:`pymer4` leverages the :code:`lsmeans` package in order to compute marginal estimates and pair-wise comparisons of models that contain categorical terms and/or interactions. This can be performed by using the :code:`.post_hoc()` method on fitted models with some examples below and in the method help. However, in order for this method to work appropriately, factors must be specified using the :code:`factors` argument of the :code:`model.fit()` method. Currently post-hoc comparisons are not possible from :code:`Lm()` models, only from :code:`Lmer()` models.
10+
11+
Examples
12+
--------
13+
Lets say that you're running a model which has two factors, A and B, with 3 levels each (i.e. 3x3 ANOVA). Below are some examples of how you could compute different types of post-hoc comparisons between factor levels.
14+
15+
**Example 1**
16+
Compare each level of A to each other level of A, within each level of B:
17+
18+
.. code-block:: python
19+
20+
# Specify post-hoc comparisons
21+
marginal_estimates, contrasts = model.post_hoc(marginal_vars='A',grouping_vars='B')
22+
23+
# print them out
24+
contrasts
25+
26+
.. image:: ../misc/contrasts1.png
27+
28+
The mean of each "cell" of the ANOVA is available in the :code:`marginal_estimates` output:
29+
30+
.. image:: ../misc/marginal1.png
31+
32+
**Example 2**
33+
Compare each unique A,B "cell" to every other A,B "cell":
34+
35+
.. code-block:: python
36+
37+
# Specify post-hoc comparisons
38+
marginal_estimates, contrasts = model.post_hoc(marginal_vars=['A','B'])
39+
40+
# print them out
41+
contrasts
42+
43+
.. image:: ../misc/contrasts2.png
44+
45+
Now lets say you run a model that has an interaction between a continuous variable C and two factors, A and B, with 3 levels each.
46+
47+
**Example 3**
48+
Compare slopes of C at each level of A within each level of B:
49+
50+
.. code-block:: python
51+
52+
# Specify post-hoc comparisons
53+
marginal_estimates, contrasts = model.post_hoc(marginal_vars='C',grouping_vars=['A','B'])
54+
55+
# print them out
56+
contrasts
57+
58+
.. image:: ../misc/contrasts3.png

docs/usage.rst

+12-2
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ Fitting a standard regression model like using :code:`lm()` in R, simply entails
5151

5252
Estimate a multi-level model
5353
----------------------------
54-
Fitting a multi-level model works similarily.
54+
Fitting a multi-level model works similarly.
5555

5656
.. code-block:: python
5757
@@ -63,6 +63,16 @@ Fitting a multi-level model works similarily.
6363
6464
.. image:: ../misc/simple_summary_lmm.png
6565

66+
Visualize model summary
67+
-----------------------
68+
We can visualize the coefficients with 95% confidence intervals, and cluster level estimates overlaid.
69+
70+
.. code-block:: python
71+
72+
model.plot_summary()
73+
74+
.. image:: ../misc/plot_summary_lmm.png
75+
6676
Inspect random effects
6777
----------------------
6878
We can look at the cluster level parameters easily as they are stored within the model object itself.
@@ -75,7 +85,7 @@ We can look at the cluster level parameters easily as they are stored within the
7585

7686
Each row here is a unique intercept and slope, which vary because we parameterized our random effects that way above.
7787

78-
We can also plot these values with respect to the population parameters above.
88+
We can also plot these values with respect to the population parameters above in a slightly different way than the forest plot above.
7989

8090
.. code-block:: python
8191

misc/contrasts1.png

42.5 KB
Loading

misc/contrasts2.png

135 KB
Loading

misc/contrasts3.png

38.3 KB
Loading

misc/marginal1.png

34.5 KB
Loading

misc/plot_summary_lmm.png

16.1 KB
Loading

pymer4/models.py

+16-4
Original file line numberDiff line numberDiff line change
@@ -95,9 +95,6 @@ def _make_factors(self,factor_dict, ordered=False):
9595
"""
9696
Covert specific columns to R-style factors. Default scheme is dummy coding where reference is 1st level provided. Alternative is orthogonal polynomial contrasts. User can also specific custom contrasts.
9797
98-
Todo:
99-
1) Automatically convert user's desired contrast -> R's contrast matrix format for them: https://github.com/ejolly/R/blob/master/Guides/Contrasts_in_R.md
100-
10198
Args:
10299
factor_dict: (dict) dictionary with column names specified as keys, and lists of unique values to treat as factor levels
103100
ordered: (bool) whether to interpret factor_dict values as dummy-coded (1st list item is reference level) or as polynomial contrasts (linear contrast specified by ordered of list items)
@@ -200,7 +197,7 @@ def fit(self,conf_int='Wald',factors=None,permute=None,ordered=False,summarize=T
200197
201198
Args:
202199
conf_int (str): which method to compute confidence intervals; 'profile', 'Wald' (default), or 'boot' (parametric bootstrap)
203-
factors (dict): col names (keys) and contrast codes (vals) for factor variables. To use dummy coding or simply poylnomial coding provide the variable name as key and a list of its unique values as values e.g. {'Col1':['A','B','C']}. First level is always treated as reference in this scheme. For custom contrasts use nested dictionary with custom values, e.g. {'Col1':{'A': -1,'B':-1,'C':2}}
200+
factors (dict): Keys should be column names in data to treat as factors. Values should either be a list containing unique variable levels if dummy-coding or polynomial coding is desired. Otherwise values should be another dictionary with unique variable levels as keys and desired contrast values (as specified in R!) as keys. See examples below
204201
permute (int): if non-zero, computes parameter significance tests by permuting test stastics rather than parametrically. Permutation is done by shuffling observations within clusters to respect random effects structure of data.
205202
ordered (bool): whether factors should be treated as ordered polynomial contrasts; this will parameterize a model with K-1 orthogonal polynomial regressors beginning with a linear contrast based on the factor order provided; default is False
206203
summarize (bool): whether to print a model summary after fitting; default is True
@@ -210,6 +207,21 @@ def fit(self,conf_int='Wald',factors=None,permute=None,ordered=False,summarize=T
210207
Returns:
211208
DataFrame: R style summary() table
212209
210+
Examples:
211+
The following examples demonstrate how to treat variables as categorical factors.
212+
213+
Dummy-Coding: Treat Col1 as a factor which 3 levels: A, B, C. Use dummy-coding with A as the reference level. Model intercept will be mean of A, and parameters will be B-A, and C-A.
214+
215+
>>> model.fit(factors = {"Col1": ['A','B','C']})
216+
217+
Orthogonal Polynomials: Treat Col1 as a factor which 3 levels: A, B, C. Estimate a linear contrast of C > B > A. Model intercept will be grand-mean of all levels, and parameters will be linear contrast, and orthogonal polynomial contrast (auto-computed).
218+
219+
>>> model.fit(factors = {"Col1": ['A','B','C']}, ordered=True)
220+
221+
Custom-contrast: Treat Col1 as a factor which 3 levels: A, B, C. Compare A to the mean of B and C. Model intercept will be the grand-mean of all levels, and parameters will be the desired contrast, a well as an automatically determined orthogonal contrast.
222+
223+
>>> model.fit(factors = {"Col1": {'A': 1, 'B': -.5, 'C': -.5}}))
224+
213225
"""
214226

215227
# Save params for future calls

pymer4/simulate.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
import numpy as np
33
import pandas as pd
44
from scipy.spatial.distance import squareform
5-
from pymer4.utils import discrete_inverse_logit, _isPSD, _nearestPSD
5+
from pymer4.utils import discrete_inverse_logit, isPSD, nearestPSD
66

77
__all__ = ['easy_multivariate_normal',
88
'simulate_lm',
@@ -198,12 +198,12 @@ def easy_multivariate_normal(num_obs, num_features, corrs, mu = 0.0, sigma = 1.0
198198
else:
199199
raise ValueError("Correlations must be num_features x num_feature, flattend numpy array/list or scalar")
200200

201-
if not _isPSD(corrs):
201+
if not isPSD(corrs):
202202
if forcePSD:
203203
# Tell user their correlations are being recomputed if they didnt ask to save them as they might not realize
204204
if not return_new_corrs:
205205
print("Correlation matrix is not positive semi-definite. Solved for new correlation matrix.")
206-
_corrs = np.array(_nearestPSD(corrs, nit))
206+
_corrs = np.array(nearestPSD(corrs, nit))
207207

208208
else:
209209
raise ValueError("Correlation matrix is not positive semi-definite. Pymer4 will not generate inaccurate multivariate data. Use the forcePD argument to automatically solve for the closest desired correlation matrix.")

pymer4/tests/test_models.py

+3
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,9 @@ def test_anova():
109109
data = pd.read_csv(os.path.join(get_resource_path(),'sample_data.csv'))
110110
data['DV_l2'] = np.random.randint(0,4,data.shape[0])
111111
model = Lmer('DV ~ IV3*DV_l2 + (IV3|Group)',data=data)
112+
model.fit(summarize=False)
113+
out = model.anova()
114+
assert out.shape == (3,7)
112115

113116
def test_poisson_lmm():
114117
np.random.seed(1)

0 commit comments

Comments
 (0)