Skip to content

Commit

Permalink
fix stale versionb
Browse files Browse the repository at this point in the history
  • Loading branch information
yiqunchen committed Nov 26, 2023
1 parent ffdcb90 commit 6295a37
Show file tree
Hide file tree
Showing 7 changed files with 43 additions and 42 deletions.
24 changes: 11 additions & 13 deletions docs/articles/Tutorials.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

22 changes: 12 additions & 10 deletions docs/articles/Tutorials_hier.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Binary file not shown.
29 changes: 15 additions & 14 deletions docs/articles/technical_details.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ articles:
Tutorials: Tutorials.html
Tutorials_hier: Tutorials_hier.html
technical_details: technical_details.html
last_built: 2023-11-26T01:22Z
last_built: 2023-11-26T01:39Z
urls:
reference: https://yiqunchen.github.io/CADET/reference
article: https://yiqunchen.github.io/CADET/articles
Expand Down
6 changes: 3 additions & 3 deletions vignettes/Tutorials.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ library(CADET)
library(ggplot2)
```

We first generate data according to $\mathbf{X} \sim {MN}_{n\times q}(\boldsymbol{\mu}, \textbf{I}_n, \sigma^2 \textbf{I}_q)$ with $n=150,q=2,\sigma=1,$ and
We first generate data according to $\mathbf{X} \sim MN_{n\times q}(\boldsymbol{\mu}, \textbf{I}_n, \sigma^2 \textbf{I}_q)$ with $n=150,q=2,\sigma=1,$ and
\begin{align}
\label{eq:power_model}
\boldsymbol{\mu}_1 =\ldots = \boldsymbol{\mu}_{50} = \begin{bmatrix}
Expand All @@ -35,7 +35,7 @@ We first generate data according to $\mathbf{X} \sim {MN}_{n\times q}(\boldsymbo
\delta/2 \\ 0_{q-1}
\end{bmatrix}.
\end{align}
Here, we can think of ${C}_1 = \{1,\ldots,50\},{C}_2 = \{51,\ldots,100\},{C}_3 = \{101,\ldots,150\}$ as the "true clusters".
Here, we can think of $C_1 = \{1,\ldots,50\},C_2 = \{51,\ldots,100\},C_3 = \{101,\ldots,150\}$ as the "true clusters".
In the figure below, we display one such simulation $\mathbf{x}\in\mathbb{R}^{100\times 2}$ with $\delta=10$.

```{r fig.align="center", fig.height = 5, fig.width = 5}
Expand Down Expand Up @@ -98,7 +98,7 @@ cl_inference_demo <- kmeans_inference_1f(X, k=3, cluster_1, cluster_2,
summary(cl_inference_demo)
```

In the summary, we have the empirical difference in means of the second feature between the two clusters, i.e.,$\sum_{i\in {\hat{{G}}}}\mathbf{x}_{i,2}/|\hat{{G}}| - \sum_{i\in \hat{G}'}\mathbf{x}_{i,2}/|\hat{G}'|$ (`test_stats`), the naive p-value based on a z-test (`p_naive`), and the selective $p$-value (`p_selective`). In this case, the test based on $p_{\text{selective}}$ can reject this null hypothesis that the blue and pink clusters have the same mean in the first feature ($p_{2,\text{selective}}<0.001$).
In the summary, we have the empirical difference in means of the second feature between the two clusters, i.e.,$\sum_{i\in \hat{G}}\mathbf{x}_{i,2}/|\hat{{G}}| - \sum_{i\in \hat{G}'}\mathbf{x}_{i,2}/|\hat{G}'|$ (`test_stats`), the naive p-value based on a z-test (`p_naive`), and the selective $p$-value (`p_selective`). In this case, the test based on $p_{\text{selective}}$ can reject this null hypothesis that the blue and pink clusters have the same mean in the first feature ($p_{2,\text{selective}}<0.001$).

### Inference for k-means clustering when the null hypothesis holds

Expand Down
2 changes: 1 addition & 1 deletion vignettes/technical_details.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ knitr::opts_chunk$set(
<center>

![](../man/figures/fig_1.png){width=90%}
<figcaption>Figure 1: We simulated one dataset according to ${MN}_{100\times 10}(\mu, \textbf{I}_{100}, \Sigma)$, where $\mu_i = (1,0_9)^T$ for $i=1,\ldots, 50$ and $\mu_i = (0_9,1)^T$ for $i=51,\ldots, 100$, and $\Sigma_{ij} = 1\{i=j\}+0.4\cdot 1\{i\neq j\}$. *(a)*: Empirical distribution of feature 2 based on the simulated data set. In this case, all observations have the same mean for feature 2. *(b)*: We apply k-means clustering to obtain two clusters and plot the empirical distribution of feature 2 stratified by the clusters. *(c)*: Quantile-quantile plot of naive z-test (black) our proposed p-values (orange) applied to the simulated data sets for testing the null hypotheses for a difference in means for features 2--8. </figcaption>
<figcaption>Figure 1: We simulated one dataset according to $MN_{100\times 10}(\mu, \textbf{I}_{100}, \Sigma)$, where $\mu_i = (1,0_9)^T$ for $i=1,\ldots, 50$ and $\mu_i = (0_9,1)^T$ for $i=51,\ldots, 100$, and $\Sigma_{ij} = 1\{i=j\}+0.4\cdot 1\{i\neq j\}$. *(a)*: Empirical distribution of feature 2 based on the simulated data set. In this case, all observations have the same mean for feature 2. *(b)*: We apply k-means clustering to obtain two clusters and plot the empirical distribution of feature 2 stratified by the clusters. *(c)*: Quantile-quantile plot of naive z-test (black) our proposed p-values (orange) applied to the simulated data sets for testing the null hypotheses for a difference in means for features 2--8. </figcaption>
</center>


Expand Down

0 comments on commit 6295a37

Please sign in to comment.