ejolly
diff --git a/‎README.md
+2-1 b/‎README.md
+2-1
diff --git a/‎docs/categorical.rst
+22-5 b/‎docs/categorical.rst
+22-5
diff --git a/‎docs/index.rst
+14-3 b/‎docs/index.rst
+14-3
diff --git a/‎docs/post_hoc.rst
+50-2 b/‎docs/post_hoc.rst
+50-2
diff --git a/‎docs/usage.rst
+12-2 b/‎docs/usage.rst
+12-2
diff --git a/‎misc/contrasts1.png
42.5 KB b/‎misc/contrasts1.png
42.5 KB
diff --git a/‎misc/contrasts2.png
135 KB b/‎misc/contrasts2.png
135 KB
diff --git a/‎misc/contrasts3.png
38.3 KB b/‎misc/contrasts3.png
38.3 KB
diff --git a/‎misc/marginal1.png
34.5 KB b/‎misc/marginal1.png
34.5 KB
diff --git a/‎misc/plot_summary_lmm.png
16.1 KB b/‎misc/plot_summary_lmm.png
16.1 KB
diff --git a/‎pymer4/models.py
+16-4 b/‎pymer4/models.py
+16-4
diff --git a/‎pymer4/simulate.py
+3-3 b/‎pymer4/simulate.py
+3-3
diff --git a/‎pymer4/tests/test_models.py
+3 b/‎pymer4/tests/test_models.py
+3
@@ -54,7 +54,7 @@ Some users have issues installing `pymer4` on recent versions of macOS. This is
     export CC= pathYouCopiedInQuotes
     export CFLAGS="-W"
     ```
-7. Finally install `rpy2` using the new compiler you just installed: `pip install rpy2==2.8.6`
+7. Finally install `rpy2` using the new compiler you just installed: `pip install rpy2==2.8.5`
 8. Now you should be able to `pip install pymer4`:)
 
 #### Change-log
@@ -64,6 +64,7 @@ Some users have issues installing `pymer4` on recent versions of macOS. This is
 - Test statistic inference for `Lmer` models can now be performed via non-parametric permutation tests that shuffle observations within clusters
 - `Lmer.fit(factors={})` arguments now support custom arbitrary contrasts
 - New forest plots for visualizing model estimates and confidence intervals via the `Lmer.plot_summary()` method
+- More comprehensive documentation with examples of new features
 
 **0.4.0**  
 - Added `post_hoc` tests to `Lmer` models
 
@@ -1,6 +1,5 @@
 Categorical Predictors
 =======================
-
 The syntax for handling categorical predictors is **different** between standard regression models and multi-level models in :code:`pymer4`. This is because formula parsing is passed to R for multi-level models, but handled by Python for standard regression models. Please note the differences below.
 
 Standard Regression Models
@@ -33,7 +32,7 @@ Multi-level Models
 
 1. Dummy-coded factor levels (treatment contrasts) in which each model term is the difference between a factor level and a selected reference level
 2. Orthogonal polynomial contrasts in which each model term is a polynomial contrast across factor levels (e.g. linear, quadratic, cubic, etc)
-3. Custom contrasts for each level of a factor, which should be provide in the manner by R.
+3. Custom contrasts for each level of a factor, which should be provide in the manner expected by R.
 
 To make re-parameterizing models easier, factor codings are passed as an argument to a model's :code:`fit` method. This obviates the need for adjusting data-frame properties as in R.
 
@@ -73,13 +72,26 @@ This will treat the order of list items as the order of factor levels for the *l
         ordered = True
         })
 
-R-flavored custom contrasts
+Custom contrasts
 ---------------------------
-:code:`Lmer()` models can also take custom factor settings based on how they are expected by R. This *differs* from typical user intuition because the :code:`contrasts()` command in R doesn't actually expect contrast weights (i.e. a design matrix) is one would intuit. Rather, it is made for generating contrast coding schemes which are the inverse of the contrast weight matrix. For more on this specification see `this reference <https://rstudio-pubs-static.s3.amazonaws.com/65059_586f394d8eb84f84b1baaf56ffb6b47f.html>`_ and `this reference <https://github.com/ejolly/R/blob/master/Guides/Contrasts_in_R.md>`_. If this is confusing see below for an alternate method.
+:code:`Lmer()` models can also take custom factor contrasts based on how they are expected by R. Remember that there can be at most k-1 model terms representing any k level factor without over-parameterizing a model. If all that's desired is a specific contrast between factor levels (e.g. rather than all possible k-1 contrasts) you have 2 options:
+
+1) Specify a custom contrast using the syntax below (also available in the :code:`model.fit()` method help). This will guarantee that one of your model terms reflects your desired contrast, and R will generate set of orthogonal contrasts for the rest of your model terms.
+
+**Example**
+Compare level '1.0' in 'IV3' to the mean of levels '0.5' and '1.5'
+
+.. code-block:: python
+
+    model.fit(factors={
+        'IV3': {'1.0': 1, '0.5': -.5, '1.5': -.5}
+    })
+
+2) Fit a model with *only* the desired contrast rather than a full set of k-1 contrasts. Contrary to how statistics is usually taught, you don't ever *have to* include a full set of k-1 contrasts for a k level factor! Follow the directions below to do this (section: "simpler" custom contrasts). The upside is you won't need to rely on R to compute anything for you (aside from the model fit), and you will have a model with exactly the number of terms as contrasts you desire giving you complete control. The downside is that :code:`model.post_hoc()` methods will not be able to do pairwise comparisons as they rely on R's internal representation of factor levels.
 
 (Simpler) Custom contrasts
 --------------------------
-Unlike the methods above, testing specific parameterizations without relying on factor coding is often easier done by creating new columns in a dataframe with specific coding schemes. These new columns can be utilized within models to test specific hypotheses. *Note: this is also a useful approach if you don't want to use patsy's formula langauge with standard regression models as noted above*.
+Testing specific parameterizations without relying on R's factor coding is often easier done by creating new columns in a dataframe with specific coding schemes. These new columns can be utilized within models to test specific hypotheses. However, the downside of this approach is that the :code:`model.post_hoc()` method will no longer be able estimate simple-effects because it will not be able to group factor levels automatically.  *Note: this is also a useful approach if you don't want to use patsy's formula langauge with standard regression models as noted above*.
 
 This is trivial using pandas map and assign methods. Here we'll only build a linear contrast across factor levels (0.5 < 1.0 < 1.5), without all exhaustive higher level polynomial terms:
 
@@ -101,3 +113,8 @@ Now we can estimate this model without the need to use the :code:`factor` argume
 
     model = Lmer('DV ~ IV3_custom_lin + (IV3_custom_lin|Group)', data=df)
     model.fit()
+
+How contrasts in R work
+-----------------------
+*This is just for folks curious about how contrasts in R work*.
+Specifying multiple custom contrasts in R has always been a point of confusion amongst users. This because the :code:`contrasts()` command in R doesn't actually expect contrast weights (i.e. a design matrix) as one would intuit. Rather, it is made for generating contrast coding schemes which are the inverse of the contrast weight matrix. For a longer explanation with examples see `this reference <https://rstudio-pubs-static.s3.amazonaws.com/65059_586f394d8eb84f84b1baaf56ffb6b47f.html>`_ and `this reference <https://github.com/ejolly/R/blob/master/Guides/Contrasts_in_R.md>`_. For these situations pymer4 offers a few utility functions to convert between these matrix types if desired in :code:`pymer4.utils`: :code:`R2con` and :code:`con2R`.
@@ -5,7 +5,7 @@
 
 Pymer4
 ======
-Love multi-level-modeling using `lme4  <https://cran.r-project.org/web/packages/lme4/index.html>`_ in R, but prefer to work in the scientific Python ecosystem? This package has got you covered! It's a small convenience package wrapping the basic functionality of lme4 for compatibility with python. (*Currently this only include linear and logistic multi-level models*)
+Love multi-level-modeling using `lme4  <https://cran.r-project.org/web/packages/lme4/index.html>`_ in R, but prefer to work in the scientific Python ecosystem? This package has got you covered! It's a small convenience package wrapping the basic functionality of lme4 for compatibility with python.
 
 This package's main purpose is to provide a clean interface that hides the back-and-forth code required when moving between R and Python. In other words a user can work completely in Python, never having to deal with R, but get (most) of lme4's goodness. Behind the scenes this package simply uses `rpy2 <https://rpy2.readthedocs.io/en/version_2.8.x/>`_ to pass objects between languages, compute what's needed, parse everything, and convert to Python types (e.g. numpy arrays, pandas dataframes, etc).
 
@@ -29,18 +29,22 @@ This package has some extra goodies to make life a bit easier, namely:
     - Easy post-hoc tests with multiple-comparisons correction via `lsmeans <https://cran.r-project.org/web/packages/lsmeans/index.html>`_
     - Easy model predictions on new data
     - Easy generation of new data from a fitted model
+    - Optional p-value computation via within cluster permutation testing (experimental)
 
 - For standard linear models (i.e. :code:`lm()`)
 
     - Automatic inclusion of confidence intervals in model output
     - Easy computation of empirically bootstrapped 95% confidence intervals
-    - Easy computation of cluster-robust, heteroscedasticity-robust or auto-correlation-robust,'sandwich estimators' for standard errors (*note: these are not the same as auto-regressive models*)
+    - Easy computation of cluster-robust, heteroscedasticity-robust or auto-correlation-robust, 'sandwich estimators' for standard errors (*note: these are not the same as auto-regressive models*)
     - Permutation tests on model parameters
 
 - Data simulation
 
     - Highly customizable functions for generating data useful for standard regression models and multi-level models
 
+- Data visualization
+
+    - Convenience methods for plotting model estimates, including random-effects terms
 
 Installation
 ------------
@@ -84,7 +88,7 @@ Some users have issues installing ``pymer4`` on recent versions of macOS. This i
         export CC= pathYouCopiedInQuotes
         export CFLAGS="-W"
 
-7. Finally install ``rpy2`` using the new compiler you just installed: ``pip install rpy2==2.8.6``
+7. Finally install ``rpy2`` using the new compiler you just installed: ``pip install rpy2==2.8.5``
 8. Now you should be able to ``pip install pymer4``:)
 
 Basic Usage Guide
@@ -101,6 +105,13 @@ Categorical Predictors
 
     categorical
 
+Post-hoc Comparisons
+--------------------
+.. toctree::
+    :maxdepth: 2
+
+    post_hoc
+
 Simulating Data
 ---------------
 .. toctree::
 
@@ -1,10 +1,58 @@
 ANOVA tables and post-hoc comparisons
 =====================================
-
 ANOVA tables and orthogonal contrasts
 -------------------------------------
 Because ANOVA is just regression, :code:`pymer4` makes it easy to estimate ANOVA tables with F-results using the :code:`.anova()` method on a fitted model. By default this will compute a Type-III SS table given the coding scheme provided when the model was initially fit. Based on the distribution of data across factor levels and the specific factor-coding used, this may produce invalid Type-III SS computations. For this reason the :code:`.anova()` method has a :code:`force-orthogonal=True` option that will reparameterize and refit the model using orthogonal polynomial contrasts prior to computing an ANOVA table.
 
 Marginal estimates and post-hoc comparisons
 -------------------------------------------
-:code:`pymer4` leverages the :code:`lsmeans` package in order to compute marginal estimates and pair-wise comparisons of models that contain categorical terms and/or interactions. This can be performed by using the :code:`.post_hoc()` method on fitted models with various examples in the method help. Currently post-hoc comparisons are not possible from :code:`Lm()` models, only from :code:`Lmer()` models.
+:code:`pymer4` leverages the :code:`lsmeans` package in order to compute marginal estimates and pair-wise comparisons of models that contain categorical terms and/or interactions. This can be performed by using the :code:`.post_hoc()` method on fitted models with some examples below and in the method help. However, in order for this method to work appropriately, factors must be specified using the :code:`factors` argument of the :code:`model.fit()` method. Currently post-hoc comparisons are not possible from :code:`Lm()` models, only from :code:`Lmer()` models.
+
+Examples
+--------
+Lets say that you're running a model which has two factors, A and B, with 3 levels each (i.e. 3x3 ANOVA). Below are some examples of how you could compute different types of post-hoc comparisons between factor levels.
+
+**Example 1**
+Compare each level of A to each other level of A, within each level of B:
+
+.. code-block:: python
+
+    # Specify post-hoc comparisons
+    marginal_estimates, contrasts = model.post_hoc(marginal_vars='A',grouping_vars='B')
+
+    # print them out
+    contrasts
+
+.. image:: ../misc/contrasts1.png
+
+The mean of each "cell" of the ANOVA is available in the :code:`marginal_estimates` output:
+
+.. image:: ../misc/marginal1.png
+
+**Example 2**
+Compare each unique A,B "cell" to every other A,B "cell":
+
+.. code-block:: python
+
+    # Specify post-hoc comparisons
+    marginal_estimates, contrasts = model.post_hoc(marginal_vars=['A','B'])
+
+    # print them out
+    contrasts
+
+.. image:: ../misc/contrasts2.png
+
+Now lets say you run a model that has an interaction between a continuous variable C and two factors, A and B, with 3 levels each.
+
+**Example 3**
+Compare slopes of C at each level of A within each level of B:
+
+.. code-block:: python
+
+    # Specify post-hoc comparisons
+    marginal_estimates, contrasts = model.post_hoc(marginal_vars='C',grouping_vars=['A','B'])
+
+    # print them out
+    contrasts
+
+.. image:: ../misc/contrasts3.png
@@ -51,7 +51,7 @@ Fitting a standard regression model like using :code:`lm()` in R, simply entails
 
 Estimate a multi-level model
 ----------------------------
-Fitting a multi-level model works similarily.
+Fitting a multi-level model works similarly.
 
 .. code-block:: python
 
@@ -63,6 +63,16 @@ Fitting a multi-level model works similarily.
 
 .. image:: ../misc/simple_summary_lmm.png
 
+Visualize model summary
+-----------------------
+We can visualize the coefficients with 95% confidence intervals, and cluster level estimates overlaid.
+
+.. code-block:: python
+
+    model.plot_summary()
+
+.. image:: ../misc/plot_summary_lmm.png
+
 Inspect random effects
 ----------------------
 We can look at the cluster level parameters easily as they are stored within the model object itself.
@@ -75,7 +85,7 @@ We can look at the cluster level parameters easily as they are stored within the
 
 Each row here is a unique intercept and slope, which vary because we parameterized our random effects that way above.
 
-We can also plot these values with respect to the population parameters above.
+We can also plot these values with respect to the population parameters above in a slightly different way than the forest plot above. 
 
 .. code-block:: python
 
 
@@ -95,9 +95,6 @@ def _make_factors(self,factor_dict, ordered=False):
         """
         Covert specific columns to R-style factors. Default scheme is dummy coding where reference is 1st level provided. Alternative is orthogonal polynomial contrasts. User can also specific custom contrasts.
 
-        Todo:
-        1) Automatically convert user's desired contrast -> R's contrast matrix format for them: https://github.com/ejolly/R/blob/master/Guides/Contrasts_in_R.md
-
         Args:
             factor_dict: (dict) dictionary with column names specified as keys, and lists of unique values to treat as factor levels
             ordered: (bool) whether to interpret factor_dict values as dummy-coded (1st list item is reference level) or as polynomial contrasts (linear contrast specified by ordered of list items)
@@ -200,7 +197,7 @@ def fit(self,conf_int='Wald',factors=None,permute=None,ordered=False,summarize=T
 
         Args:
             conf_int (str): which method to compute confidence intervals; 'profile', 'Wald' (default), or 'boot' (parametric bootstrap)
-            factors (dict): col names (keys) and contrast codes (vals) for factor variables. To use dummy coding or simply poylnomial coding provide the variable name as key and a list of its unique values as values e.g. {'Col1':['A','B','C']}. First level is always treated as reference in this scheme. For custom contrasts use nested dictionary with custom values, e.g. {'Col1':{'A': -1,'B':-1,'C':2}}
+            factors (dict): Keys should be column names in data to treat as factors. Values should either be a list containing unique variable levels if dummy-coding or polynomial coding is desired. Otherwise values should be another dictionary with unique variable levels as keys and desired contrast values (as specified in R!) as keys. See examples below
             permute (int): if non-zero, computes parameter significance tests by permuting test stastics rather than parametrically. Permutation is done by shuffling observations within clusters to respect random effects structure of data.
             ordered (bool): whether factors should be treated as ordered polynomial contrasts; this will parameterize a model with K-1 orthogonal polynomial regressors beginning with a linear contrast based on the factor order provided; default is False
             summarize (bool): whether to print a model summary after fitting; default is True
@@ -210,6 +207,21 @@ def fit(self,conf_int='Wald',factors=None,permute=None,ordered=False,summarize=T
         Returns:
             DataFrame: R style summary() table
 
+        Examples:
+            The following examples demonstrate how to treat variables as categorical factors.
+
+            Dummy-Coding: Treat Col1 as a factor which 3 levels: A, B, C. Use dummy-coding with A as the reference level. Model intercept will be mean of A, and parameters will be B-A, and C-A.
+
+            >>> model.fit(factors = {"Col1": ['A','B','C']})
+
+            Orthogonal Polynomials: Treat Col1 as a factor which 3 levels: A, B, C. Estimate a linear contrast of C > B > A. Model intercept will be grand-mean of all levels, and parameters will be linear contrast, and orthogonal polynomial contrast (auto-computed).
+
+            >>> model.fit(factors = {"Col1": ['A','B','C']}, ordered=True)
+
+            Custom-contrast: Treat Col1 as a factor which 3 levels: A, B, C. Compare A to the mean of B and C. Model intercept will be the grand-mean of all levels, and parameters will be the desired contrast, a well as an automatically determined orthogonal contrast.
+
+            >>> model.fit(factors = {"Col1": {'A': 1, 'B': -.5, 'C': -.5}}))
+
         """
 
         # Save params for future calls
 
@@ -2,7 +2,7 @@
 import numpy as np
 import pandas as pd
 from scipy.spatial.distance import squareform
-from pymer4.utils import discrete_inverse_logit, _isPSD, _nearestPSD
+from pymer4.utils import discrete_inverse_logit, isPSD, nearestPSD
 
 __all__  = ['easy_multivariate_normal',
             'simulate_lm',
@@ -198,12 +198,12 @@ def easy_multivariate_normal(num_obs, num_features, corrs, mu = 0.0, sigma = 1.0
     else:
         raise ValueError("Correlations must be num_features x num_feature, flattend numpy array/list or scalar")
 
-    if not _isPSD(corrs):
+    if not isPSD(corrs):
         if forcePSD:
             # Tell user their correlations are being recomputed if they didnt ask to save them as they might not realize
             if not return_new_corrs:
                 print("Correlation matrix is not positive semi-definite. Solved for new correlation matrix.")
-            _corrs = np.array(_nearestPSD(corrs, nit))
+            _corrs = np.array(nearestPSD(corrs, nit))
 
         else:
             raise ValueError("Correlation matrix is not positive semi-definite. Pymer4 will not generate inaccurate multivariate data. Use the forcePD argument to automatically solve for the closest desired correlation matrix.")
 
@@ -109,6 +109,9 @@ def test_anova():
     data = pd.read_csv(os.path.join(get_resource_path(),'sample_data.csv'))
     data['DV_l2'] = np.random.randint(0,4,data.shape[0])
     model = Lmer('DV ~ IV3*DV_l2 + (IV3|Group)',data=data)
+    model.fit(summarize=False)
+    out = model.anova()
+    assert out.shape == (3,7)
 
 def test_poisson_lmm():
     np.random.seed(1)