diff --git a/README.md b/README.md index 901ed2f..839ef68 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ ## A Hierarchical Resampling Package for Python -Version 1.1 +Version 1.1.1 hierarch is a package for hierarchical resampling (bootstrapping, permutation) of datasets in Python. Because for loops are ultimately intrinsic to cluster-aware resampling, hierarch uses Numba to accelerate many of its key functions. diff --git a/docs/user/confidence.rst b/docs/user/confidence.rst index 22af507..1d25975 100644 --- a/docs/user/confidence.rst +++ b/docs/user/confidence.rst @@ -65,7 +65,7 @@ confidence interval. :: from hierarch.stats import confidence_interval - ha.stats.confidence_interval( + confidence_interval( data, treatment_col=0, compare='means', @@ -84,7 +84,7 @@ Because ha.stats.confidence_interval is based on a hypothesis test, it requires the same input parameters as hypothesis_test. However, the new **interval** parameter determines the width of the interval. :: - ha.stats.confidence_interval( + confidence_interval( data, treatment_col=0, compare='means', @@ -96,7 +96,7 @@ the new **interval** parameter determines the width of the interval. :: (-0.9086402840632387, 0.25123067872990457) - ha.stats.confidence_interval( + confidence_interval( data, treatment_col=0, compare='means', @@ -141,7 +141,8 @@ this value. You can test this with the following code. :: for i in range(loops): data = sim.generate() - lower, upper = ha.stats.confidence_interval(data, 0, interval=95, bootstraps=100, permutations='all') + lower, upper = confidence_interval(data, 0, interval=95, + bootstraps=100, permutations='all') if lower <= true_difference <= upper: coverage += 1 @@ -223,7 +224,7 @@ for **compare** when computing a confidence interval. :: from hierarch.stats import confidence_interval - ha.stats.confidence_interval( + confidence_interval( data, treatment_col=0, compare='corr', @@ -260,7 +261,8 @@ set up a simulation as above to check the coverage of the 95% confidence interva for i in range(loops): data = datagen.generate() - lower, upper = ha.stats.confidence_interval(data, 0, interval=95, bootstraps=100, permutations='all') + lower, upper = confidence_interval(data, 0, interval=95, + bootstraps=100, permutations='all') if lower <= true_difference <= upper: coverage += 1 @@ -279,7 +281,8 @@ interest. :: for i in range(loops): data = datagen.generate() - lower, upper = ha.stats.confidence_interval(data, 0, interval=99, bootstraps=100, permutations='all') + lower, upper = confidence_interval(data, 0, interval=99, + bootstraps=100, permutations='all') if lower <= true_difference <= upper: coverage += 1 diff --git a/docs/user/hypothesis.rst b/docs/user/hypothesis.rst index b800050..2326366 100644 --- a/docs/user/hypothesis.rst +++ b/docs/user/hypothesis.rst @@ -74,7 +74,9 @@ column - in this case, "Condition." Indexing starts at 0, so you input treatment_col=0. In this case, there are only 6c3 = 20 ways to permute the treatment labels, so you should specify "all" permutations be used. :: - p_val = ha.stats.hypothesis_test(data, treatment_col=0, compare='means', + from hierarch.stats import hypothesis_test + + p_val = hypothesis_test(data, treatment_col=0, compare='means', bootstraps=500, permutations='all', random_state=1) @@ -84,20 +86,23 @@ treatment labels, so you should specify "all" permutations be used. :: There are a number of parameters that can be used to modify hypothesis_test. :: - ha.stats.hypothesis_test(data_array, - treatment_col, - compare="means", - skip=None, - bootstraps=100, - permutations=1000, - kind='weights', - return_null=False, - random_state=None) + hypothesis_test(data_array, + treatment_col, + compare="means", + skip=None, + bootstraps=100, + permutations=1000, + kind='weights', + return_null=False, + random_state=None) **compare**: The default "means" assumes that you are testing for a difference in means, so it uses the Welch t-statistic. "corr" uses a studentized covariance based test statistic which gives the same result as the Welch t-statistic for two-sample datasets, but can be used on datasets with any number of related treatment groups. For flexibility, hypothesis_test can -also take a test statistic function as an argument. +also take a test statistic function as an argument. + +**alternative** : "two-sided" or "less" or "greater" specifies the alternative hypothesis. "two-sided" conducts +a two-tailed test, while "less" or "greater" conduct the appropriate one-tailed test. **skip**: indicates the indices of columns that should be skipped in the bootstrapping procedure. @@ -228,6 +233,8 @@ treatment 2 represents a slight difference and treatment 4 represents a large di There are six total comparisons that can be made, which can be performed automatically using multi_sample_test as follows. :: + from hierarch.stats import multi_sample_test + multi_sample_test(data, treatment_col=0, hypotheses="all", correction=None, bootstraps=1000, permutations="all", random_state=111) diff --git a/docs/user/overview.rst b/docs/user/overview.rst index 1b55068..7a38038 100644 --- a/docs/user/overview.rst +++ b/docs/user/overview.rst @@ -51,5 +51,7 @@ Here is the sort of data that hierarch is designed to perform hypothesis tests o The code to perform a hierarchical permutation t-test on this dataset looks like:: - hierarch.stats.hypothesis_test(data, treatment_col=0, - bootstraps=1000, permutations='all') \ No newline at end of file + from hierarch.stats import hypothesis_test + + hypothesis_test(data, treatment_col=0, + bootstraps=1000, permutations='all') \ No newline at end of file diff --git a/docs/user/power.rst b/docs/user/power.rst index 97d873c..fe1f8c3 100644 --- a/docs/user/power.rst +++ b/docs/user/power.rst @@ -91,11 +91,13 @@ permutations (though this is overkill in the 2, 3, 3 case) on each of 100 simulated datasets and prints the fraction of them that return a significant result, assuming a p-value cutoff of 0.05. :: + from hierarch.stats import hypothesis_test + pvalues = [] loops = 100 for i in range(loops): data = sim.generate() - pvalues.append(ha.stats.hypothesis_test(data, 0, bootstraps=500, permutations=100)) + pvalues.append(hypothesis_test(data, 0, bootstraps=500, permutations=100)) print(np.less(pvalues, 0.05).sum() / loops) @@ -111,7 +113,7 @@ you determine the column 1 sample size that achieves at least 80% power. :: loops = 100 for i in range(loops): data = sim.generate() - pvalues.append(ha.stats.hypothesis_test(data, 0, bootstraps=500, permutations=100)) + pvalues.append(hypothesis_test(data, 0, bootstraps=500, permutations=100)) print(np.less(pvalues, 0.05).sum() / loops) @@ -134,7 +136,7 @@ achieved with an experimental design that makes more column 2 measurements. :: loops = 100 for i in range(loops): data = sim.generate() - pvalues.append(ha.stats.hypothesis_test(data, 0, bootstraps=500, permutations=100)) + pvalues.append(hypothesis_test(data, 0, bootstraps=500, permutations=100)) print(np.less(pvalues, 0.05).sum() / loops) @@ -154,7 +156,7 @@ only 30 column 2 samples. :: loops = 100 for i in range(loops): data = sim.generate() - pvalues.append(ha.stats.hypothesis_test(data, 0, bootstraps=500, permutations=100)) + pvalues.append(hypothesis_test(data, 0, bootstraps=500, permutations=100)) print(np.less(pvalues, 0.05).sum() / loops) @@ -180,7 +182,7 @@ the error for an event that happens 5% probability is +/- 2%, but at loops = 1000 for i in range(loops): data = sim.generate() - pvalues.append(ha.stats.hypothesis_test(data, 0, bootstraps=500, permutations=100)) + pvalues.append(hypothesis_test(data, 0, bootstraps=500, permutations=100)) print(np.less(pvalues, 0.05).sum() / loops) diff --git a/setup.py b/setup.py index 89f559d..9f3ca4a 100644 --- a/setup.py +++ b/setup.py @@ -5,7 +5,7 @@ setuptools.setup( name="hierarch", - version="1.1.0", + version="1.1.1", author="Rishi Kulkarni", author_email="rkulk@stanford.edu", description="Hierarchical hypothesis testing library",