Skip to content

Latest commit

 

History

History
96 lines (72 loc) · 3.91 KB

README.md

File metadata and controls

96 lines (72 loc) · 3.91 KB

COPTeRR

  
Compute-Optimized Permutation Testing for Ridge Regression
                                                   ______.........--=T=--.........______
                                                      .             |:|
                                                :-. //           /""""""-.
                                                ': '-._____..--""(""""""()`---.__
                                                /:   _..__   ''  ":""""'[] |""`\\
                                                ': :'     `-.     _:._     '"""" :
                                                 ::          '--=:____:.___....-"
                                                                O"       O"

COPTeRR is a Python package for efficiently computing permutation tests on ridge regression model weights, performance, and other metrics, without re-fitting the model for each permutation. This enables rapid estimation of null distributions (and thus, p-values) for any weights-derived model characteristic.

Simple Example

First, let's make some basic demo data.

import numpy as np
from tqdm import tqdm

n_timepoints_train, n_timepoints_test = 500, 100
n_features, n_targets = 10, 3

# Create features
X_train = np.random.rand(n_timepoints_train, n_features)
X_test = np.random.rand(n_timepoints_test, n_features)
true_weights = np.random.randn(n_features, n_targets)

# Create targets
Y_train = X_train @ true_weights
Y_test = X_test @ true_weights

# Add different amounts of noise
Y_train[:,0] += 1  * np.random.randn(n_timepoints_train)
Y_train[:,1] += 3  * np.random.randn(n_timepoints_train)
Y_train[:,2] += 15 * np.random.randn(n_timepoints_train)

Y_test[:,0] += 1  * np.random.randn(n_timepoints_test)
Y_test[:,1] += 3  * np.random.randn(n_timepoints_test)
Y_test[:,2] += 15 * np.random.randn(n_timepoints_test)

COPTeRR computes permutation tests on already-fit ridge regression models. You can perform this initial fit using a variety of packages, but we recommend Himalaya.

from himalaya.ridge import RidgeCV

model = RidgeCV(alphas=np.logspace(-2, 5, 8))
model.fit(X_train, Y_train)
model_performance = model.score(X_test, Y_test)
print("R2 Score Per Target: ", model_performance)

This performance seems pretty good, but how can we be sure that it's significant? We can determine this by generating a null distribution of chance values using COPTeRR!

from copterr import PermuteWeights

permuter = PermuteWeights(X_train, Y_train, model.best_alphas_)
permuter.prepare()

himalaya_weights = model.coef_
copterr_weights = permuter.fit_true_weights()
print('Weights Equivalent:', np.allclose(himalaya_weights, copterr_weights, atol=1e-5))

Now, let's perform a bunch of permutations, and compute p-values for the model's performance.

from copterr.utils import compute_p_values

perm_performance = []
for permutation in tqdm(range(10000)):
    perm_weights = permuter.fit_permutation(permutation=True)
    perm_r2 = permuter.score(X_test, Y_test, permutation=True)
    perm_performance.append(perm_r2.numpy())

p_values = compute_p_values(model_performance, perm_performance)
print("P-Values:", p_values)

Feel free to vary the noise_amount variable and check out the effects--lower amounts of noise should yield smaller p-values for our model's performance. Also note that model R2 is just a single example--anything you compute from your model's weights can be tested in this manner!

Check out the tutorials folder for more advanced tips and use cases!

Setup/Installation

Copterr is dependency-light, and can be installed by downloading this repository and installing via pip:

git clone https://github.com/piecesofmindlab/copterr.git
cd copterr
pip install .

Use of GPU acceleration requires installation of PyTorch with CUDA support.