1604VocabCase

Lowercase Vocabulary

Up to a point, we did not do any case normalization for the tokens, whereas we used lowercase-only GloVe. This seems like a major "oops" issue! But the rechecks (done as 3rnn, 3a51, etc.) do not suggest any bad effect...

wang

Baseline (uppercase containing words - incl. sentence beginnings etc. - not initialized by GloVe and with embedding not shared with lowercase):

Model	trainAllMRR	devMRR	testMAP	testMRR	settings
rnn	0.791770	0.842155	0.648863	0.742747	(defaults)
	±0.017036	±0.009447	±0.010918	±0.009896
attn1511	0.852364	0.851368	0.708163	0.789822	(defaults)
	±0.017280	±0.005533	±0.008958	±0.013308

16x R_aw_3rnn - 0.846884 (95% [0.835546, 0.858221]):

10884139.arien.ics.muni.cz.R_aw_3rnn etc.
[0.833394, 0.868315, 0.835178, 0.801319, 0.849495, 0.812454, 0.862418, 0.882860, 0.856581, 0.841828, 0.871410, 0.855158, 0.832692, 0.862063, 0.856643, 0.828333, ]

16x R_aw_3a51 - 0.858996 (95% [0.850141, 0.867850]):

10884140.arien.ics.muni.cz.R_aw_3a51 etc.
[0.842454, 0.870359, 0.864359, 0.873948, 0.870009, 0.808864, 0.867179, 0.856659, 0.865476, 0.881802, 0.874872, 0.852949, 0.849231, 0.853333, 0.861795, 0.850641, ]

TODO final evaluation on 3rnn to confirm that this doesn't reduce early stopping overfitting or such

yodaqa/curatedv2

Model	trainAllMRR	devMRR	testMAP	testMRR	settings
rnn	0.459869	0.429780	0.228869	0.341706	(defaults)
	±0.035981	±0.015609	±0.005554	±0.010643

8x R_ay_3rnn - 0.419903 (95% [0.399927, 0.439880]):

10884108.arien.ics.muni.cz.R_ay_3rnn etc.
[0.416091, 0.411091, 0.401161, 0.422062, 0.461068, 0.452422, 0.383907, 0.411424, ]

yodaqa/large2470

Model	trainAllMRR	devMRR	testMAP	testMRR	settings
rnn	0.460984	0.382949	0.262463	0.381298	(defaults)
	±0.023715	±0.006451	±0.002641	±0.007643
attn1511	0.445635	0.408495	0.288100	0.430892	(defaults)
	±0.056352	±0.008744	±0.005601	±0.017858

4x R_al_3rnn - 0.395602 (95% [0.383595, 0.407609]):

10884135.arien.ics.muni.cz.R_al_3rnn etc.
[0.408591, 0.391750, 0.392145, 0.389923, ]

4x R_al_3a51 - 0.404151 (95% [0.382397, 0.425904]):

10884137.arien.ics.muni.cz.R_al_3a51 etc.
[0.381521, 0.412896, 0.405499, 0.416687, ]

Ubuntu

| avg | 0.619594 | 0.790865 | 0.603921 | 0.623905 | 0.793301 | 0.608480 | | |±0.001845 |±0.002024 |±0.002699 |±0.001948 |±0.002199 |±0.002341 |

avg with new vocabulary:

data/anssel/ubuntu/v2-valset.pickle MRR: 0.621408
data/anssel/ubuntu/v2-valset.pickle 2-R@1: 0.791616
data/anssel/ubuntu/v2-valset.pickle 10-R@1: 0.467536  10-R@2: 0.607055  10-R@5: 0.835020

data/anssel/ubuntu/v2-valset.pickle MRR: 0.620516
data/anssel/ubuntu/v2-valset.pickle 2-R@1: 0.790133
data/anssel/ubuntu/v2-valset.pickle 10-R@1: 0.467229  10-R@2: 0.604959  10-R@5: 0.833691

MSR

baseline 0.812771 ±0.006366

16x R_pm_3rnn - F-Score 0.800006 (95% [0.788322, 0.811690]):

10887075.arien.ics.muni.cz.R_pm_3rnn etc.
[0.803665, 0.823684, 0.798962, 0.817150, 0.821687, 0.821516, 0.811705, 0.805556, 0.822023, 0.801068, 0.751753, 0.795306, 0.783821, 0.812169, 0.768817, 0.761216, ]

STS SICK2014

baseline:

| rnn | 0.732334 | 0.684798 | 0.663615 | 0.663615 | (defaults) | |±0.035202 |±0.016028 |±0.022356 |

16x R_si_3rnn - 0.670247 (95% [0.648247, 0.692246]):

10887077.arien.ics.muni.cz.R_si_3rnn etc.
[0.682193, 0.676602, 0.676918, 0.533230, 0.690744, 0.678811, 0.684401, 0.655812, 0.682989, 0.701587, 0.666743, 0.675061, 0.699224, 0.694073, 0.614459, 0.711100, ]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly