Skip to content

1604VocabCase

Petr Baudis edited this page Apr 23, 2016 · 2 revisions

Lowercase Vocabulary

Up to a point, we did not do any case normalization for the tokens, whereas we used lowercase-only GloVe. This seems like a major "oops" issue! But the rechecks (done as 3rnn, 3a51, etc.) do not suggest any bad effect...

wang

Baseline (uppercase containing words - incl. sentence beginnings etc. - not initialized by GloVe and with embedding not shared with lowercase):

Model trainAllMRR devMRR testMAP testMRR settings
rnn 0.791770 0.842155 0.648863 0.742747 (defaults)
±0.017036 ±0.009447 ±0.010918 ±0.009896
attn1511 0.852364 0.851368 0.708163 0.789822 (defaults)
±0.017280 ±0.005533 ±0.008958 ±0.013308

16x R_aw_3rnn - 0.846884 (95% [0.835546, 0.858221]):

10884139.arien.ics.muni.cz.R_aw_3rnn etc.
[0.833394, 0.868315, 0.835178, 0.801319, 0.849495, 0.812454, 0.862418, 0.882860, 0.856581, 0.841828, 0.871410, 0.855158, 0.832692, 0.862063, 0.856643, 0.828333, ]

16x R_aw_3a51 - 0.858996 (95% [0.850141, 0.867850]):

10884140.arien.ics.muni.cz.R_aw_3a51 etc.
[0.842454, 0.870359, 0.864359, 0.873948, 0.870009, 0.808864, 0.867179, 0.856659, 0.865476, 0.881802, 0.874872, 0.852949, 0.849231, 0.853333, 0.861795, 0.850641, ]                                                 

TODO final evaluation on 3rnn to confirm that this doesn't reduce early stopping overfitting or such

yodaqa/curatedv2

Model trainAllMRR devMRR testMAP testMRR settings
rnn 0.459869 0.429780 0.228869 0.341706 (defaults)
±0.035981 ±0.015609 ±0.005554 ±0.010643

8x R_ay_3rnn - 0.419903 (95% [0.399927, 0.439880]):

10884108.arien.ics.muni.cz.R_ay_3rnn etc.
[0.416091, 0.411091, 0.401161, 0.422062, 0.461068, 0.452422, 0.383907, 0.411424, ]

yodaqa/large2470

Model trainAllMRR devMRR testMAP testMRR settings
rnn 0.460984 0.382949 0.262463 0.381298 (defaults)
±0.023715 ±0.006451 ±0.002641 ±0.007643
attn1511 0.445635 0.408495 0.288100 0.430892 (defaults)
±0.056352 ±0.008744 ±0.005601 ±0.017858

4x R_al_3rnn - 0.395602 (95% [0.383595, 0.407609]):

10884135.arien.ics.muni.cz.R_al_3rnn etc.
[0.408591, 0.391750, 0.392145, 0.389923, ]

4x R_al_3a51 - 0.404151 (95% [0.382397, 0.425904]):

10884137.arien.ics.muni.cz.R_al_3a51 etc.
[0.381521, 0.412896, 0.405499, 0.416687, ]

Ubuntu

| avg | 0.619594 | 0.790865 | 0.603921 | 0.623905 | 0.793301 | 0.608480 | | |±0.001845 |±0.002024 |±0.002699 |±0.001948 |±0.002199 |±0.002341 |

avg with new vocabulary:

data/anssel/ubuntu/v2-valset.pickle MRR: 0.621408
data/anssel/ubuntu/v2-valset.pickle 2-R@1: 0.791616
data/anssel/ubuntu/v2-valset.pickle 10-R@1: 0.467536  10-R@2: 0.607055  10-R@5: 0.835020

data/anssel/ubuntu/v2-valset.pickle MRR: 0.620516
data/anssel/ubuntu/v2-valset.pickle 2-R@1: 0.790133
data/anssel/ubuntu/v2-valset.pickle 10-R@1: 0.467229  10-R@2: 0.604959  10-R@5: 0.833691

MSR

baseline 0.812771 ±0.006366

16x R_pm_3rnn - F-Score 0.800006 (95% [0.788322, 0.811690]):

10887075.arien.ics.muni.cz.R_pm_3rnn etc.
[0.803665, 0.823684, 0.798962, 0.817150, 0.821687, 0.821516, 0.811705, 0.805556, 0.822023, 0.801068, 0.751753, 0.795306, 0.783821, 0.812169, 0.768817, 0.761216, ]

STS SICK2014

baseline:

| rnn | 0.732334 | 0.684798 | 0.663615 | 0.663615 | (defaults) | |±0.035202 |±0.016028 |±0.022356 |

16x R_si_3rnn - 0.670247 (95% [0.648247, 0.692246]):

10887077.arien.ics.muni.cz.R_si_3rnn etc.
[0.682193, 0.676602, 0.676918, 0.533230, 0.690744, 0.678811, 0.684401, 0.655812, 0.682989, 0.701587, 0.666743, 0.675061, 0.699224, 0.694073, 0.614459, 0.711100, ]
Clone this wiki locally