Subsequence Probability #149

dedcode · 2016-01-17T19:11:43Z

Hi,
I am using char-nn to sample only a small number of characters (e.g., -length 20) given some seed text.
Is there a possibility to compute the probability with which a sub-sequence was generated out of all other options at each char?
My goal is to compute a confidence score on a generated word.
Thanks !

FragLegs · 2016-01-24T16:20:07Z

Take a look at my pull request: #151

vinhqdang · 2016-07-24T11:02:42Z

Hi,

Thanks for your answer @FragLegs , but I am not suer how should I use your code.

Let's say I have a trained text "abcd", and I want to predict the next character, and want output like:

a: 0.4
b:0.1
c:0.2
d:0.3

the number is probability of the corresponding character will appear as 5th character.

FragLegs · 2016-07-25T01:47:12Z

Hi @vinhqdang . My pull request is designed to do something slightly different. You can use it to do what you are trying to accomplish, but you might be better served editing the code yourself.

My PR is intended to give the probability of a string of characters (both the seed and the characters generated by the rnn). So, let's say you want the (log) probability of "abcda". You can get that via th sample.lua cv/my_checkpointed_model.t7 -primetext "abcda" -length 0

Similarly, for "abcdb" you can call th sample.lua cv/my_checkpointed_model.t7 -primetext "abcdb" -length 0 and so on.

In order to determine the probability of each of those characters in the 5th position, you'll also need to know the probability of the 4 leading characters via th sample.lua cv/my_checkpointed_model.t7 -primetext "abcd" -length 0

For a language model such as this one, the probability of c_0, c_1, c_2, ... c_-2, c_-1, c equals the probability of c given c_0, c_1, c_2, ... c_-2, c_-1 times the probability of c_0, c_1, c_2, ... c_-2, c_-1. So, to get the probability of character c given c_0, c_1, c_2, ... c_-2, c_-1, simply divide the probability of c_0, c_1, c_2, ... c_-2, c_-1, c by the probability of c_0, c_1, c_2, ... c_-2, c_-1. To make that more concrete, in your example above:

a: P(abcda) / P(abcd)
b: P(abcdb) / P(abcd)
c: P(abcdc) / P(abcd)
d: P(abcdd) / P(abcd)

Since my script outputs log probabilities, simply subtract the value you get via th sample.lua cv/my_checkpointed_model.t7 -primetext "abcd" -length 0 from the value you get via th sample.lua cv/my_checkpointed_model.t7 -primetext "abcda" -length 0 to get the log probability of a given abcd.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subsequence Probability #149

Subsequence Probability #149

dedcode commented Jan 17, 2016

FragLegs commented Jan 24, 2016

vinhqdang commented Jul 24, 2016

FragLegs commented Jul 25, 2016

Subsequence Probability #149

Subsequence Probability #149

Comments

dedcode commented Jan 17, 2016

FragLegs commented Jan 24, 2016

vinhqdang commented Jul 24, 2016

FragLegs commented Jul 25, 2016