Skip to content
This repository has been archived by the owner on Jan 12, 2020. It is now read-only.

simulate Q matrix with repeated runs from dirichlet distribution #21

Open
sa-lee opened this issue Jun 17, 2016 · 2 comments
Open

simulate Q matrix with repeated runs from dirichlet distribution #21

sa-lee opened this issue Jun 17, 2016 · 2 comments
Assignees

Comments

@sa-lee
Copy link
Owner

sa-lee commented Jun 17, 2016

see http://www.cs.cmu.edu/~epxing/Class/10701-08s/recitation/dirichlet.pdf and
https://en.wikipedia.org/wiki/Dirichlet_distribution

k <- 20
r <- 20
n_samples <- 100
# generate Q matrix by sampling from diricihlet
# keep alpha hyper param constant (same mean and variance for each K)
Q_1 <- MCMCpack::rdirichlet(n_samples, rep(1, k))
# permute labels by shuffling and adding gaussian noise noise over number of runs
permute_and_add_noise <- function(Q_mat, r) {
  logit <- function(x) { log(x / (1-x)) }
  perturb <- function(Q_mat) { 
    # add gaussian noise to each row of matrix
    q_random <- apply(Q_mat, 1, 
                      function(y) 1 / (1 + exp(-rnorm(length(y), 
                                                      mean = logit(y), 
                                                      sd = 0.01))))
    # normalise rows 
    q_normalised <- q_random / rowSums(q_random)
    # shuffle labels 
    q_normalised[, sample(seq_len(ncol(q_normalised)))]
  }

  replicate(r - 1, perturb(Q_mat))
}

permute_and_add_noise(Q_1, r)

Will have to fix labelling at some point but it is a start.

@sa-lee
Copy link
Owner Author

sa-lee commented Jul 27, 2016

@gtonkinhill i think this might be reasonable for testing different label switching algs. should look into Stephens, 2000 as well.

sa-lee pushed a commit that referenced this issue Aug 2, 2016
@sa-lee
Copy link
Owner Author

sa-lee commented Aug 4, 2016

As discussed the other day - these are the underlying problems here:

  1. Introducing noise to the Q-estimates (which the code in previous commit does do.)
  2. Introducing noise in the label switches
  3. Changing Dirichlet params to introduce multiple modes.

The suggestion by Melanie to solve 2. is to pass a confusion matrix as input for a sample being mislabelled. i.e. say we have a Q-matrix with correct labels then in each replicate we add noise to the Q and swap labels with some probability given by the confusion matrix.

Alternatively
The label.switching package has a function called permute.mcmc that I think does what we want.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants