Simplify LDA input parameterization #143

gokceneraslan · 2018-11-11T04:21:15Z

I tried to simplify LDA input representation by using a simple M x V matrix of word frequencies where M and V represent number of documents and words. In the model, now instead of iterating over all words of all documents, iterations are over each element of the M x V matrix.

bob-carpenter · 2018-11-17T21:19:56Z

Thanks for submitting. I've been out for a while, so haven't been able to review this, but I'll get to it ASAP.

bob-carpenter

Please just add new models rather than replacing the existing ones.

The new implementations have very different memory properties (which will only be better in some dense cases).

bob-carpenter · 2018-11-17T21:25:46Z

misc/cluster/lda/corr-lda.stan

+      if (count > 0) {
+        for (k in 1:K) {
+          gamma[k] = (log(theta[i,k]) + log(phi[k,j]))*count;
+        }


Rather than looping to define gamma, a one-liner will do it:

gamma = count * (log(theta[i, ]) + log(phi[, j]);

The loop works for the count == 0 case (though a bit ineffciently) but presumably there aren't any zero-length documents in well-formed data sets, so I'd just write replace this whole loop with:

for (i in 1:M) for (j in 1:V) target += log_sum_exp(count * (log(theta[i, ]) + log(phi[, j]));

increment_log_prob is deprecated---this model has been around for a while without being updated.

bob-carpenter · 2018-11-17T21:28:26Z

misc/cluster/lda/corr-lda.stan

  for (m in 1:K) {
-    Sigma[m,m] <- sigma[m] * sigma[m] * Omega[m,m];


Thanks for replacing <- --- that would be a good change for the original model, as well as replacing increment_log_prob with target +=.

square(sigma[m]) will be a bit more efficient, as would be sigma[m]^2.

bob-carpenter · 2018-11-17T21:31:00Z

Oh, and I'd suggest adding suffixes to existing model names like _counts to indicate you're taking sufficient stats rather than the raw data.

gokceneraslan · 2018-11-17T23:04:27Z

Oh, and I'd suggest adding suffixes to existing model names like _counts to indicate you're taking sufficient stats rather than the raw data.

You mean adding _counts to the new model? Because it's the one uses counts.

bob-carpenter · 2018-11-18T22:48:29Z

Anything to distinguish the way in which data is coded in the two approaches. So yes, I meant keeping <foo>.stan as is and adding <foo>_counts.stan or something similar for the sufficient stats version.

…

On Nov 17, 2018, at 6:04 PM, Gökçen Eraslan ***@***.***> wrote: Oh, and I'd suggest adding suffixes to existing model names like _counts to indicate you're taking sufficient stats rather than the raw data. You mean adding _counts to the new model? Because it's the one uses counts. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Simplify LDA input parameterization.

aa1d0b2

bob-carpenter requested changes Nov 17, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Simplify LDA input parameterization #143

Simplify LDA input parameterization #143

Uh oh!

gokceneraslan commented Nov 11, 2018

Uh oh!

bob-carpenter commented Nov 17, 2018

Uh oh!

bob-carpenter left a comment

Uh oh!

bob-carpenter Nov 17, 2018

Uh oh!

bob-carpenter Nov 17, 2018

Uh oh!

bob-carpenter commented Nov 17, 2018

Uh oh!

gokceneraslan commented Nov 17, 2018

Uh oh!

bob-carpenter commented Nov 18, 2018 via email

Uh oh!

Uh oh!

		for (m in 1:K) {
		Sigma[m,m] <- sigma[m] * sigma[m] * Omega[m,m];

Uh oh!

Simplify LDA input parameterization #143

Are you sure you want to change the base?

Simplify LDA input parameterization #143

Uh oh!

Conversation

gokceneraslan commented Nov 11, 2018

Uh oh!

bob-carpenter commented Nov 17, 2018

Uh oh!

bob-carpenter left a comment

Choose a reason for hiding this comment

Uh oh!

bob-carpenter Nov 17, 2018

Choose a reason for hiding this comment

Uh oh!

bob-carpenter Nov 17, 2018

Choose a reason for hiding this comment

Uh oh!

bob-carpenter commented Nov 17, 2018

Uh oh!

gokceneraslan commented Nov 17, 2018

Uh oh!

bob-carpenter commented Nov 18, 2018 via email

Uh oh!

Uh oh!