-
Notifications
You must be signed in to change notification settings - Fork 204
1604VocabCase
Up to a point, we did not do any case normalization for the tokens, whereas we used lowercase-only GloVe. This seems like a major "oops" issue! But the rechecks (done as 3rnn, 3a51, etc.) do not suggest any bad effect...
Baseline (uppercase containing words - incl. sentence beginnings etc. - not initialized by GloVe and with embedding not shared with lowercase):
Model | trainAllMRR | devMRR | testMAP | testMRR | settings |
---|---|---|---|---|---|
rnn | 0.791770 | 0.842155 | 0.648863 | 0.742747 | (defaults) |
±0.017036 | ±0.009447 | ±0.010918 | ±0.009896 | ||
attn1511 | 0.852364 | 0.851368 | 0.708163 | 0.789822 | (defaults) |
±0.017280 | ±0.005533 | ±0.008958 | ±0.013308 |
16x R_aw_3rnn - 0.846884 (95% [0.835546, 0.858221]):
10884139.arien.ics.muni.cz.R_aw_3rnn etc.
[0.833394, 0.868315, 0.835178, 0.801319, 0.849495, 0.812454, 0.862418, 0.882860, 0.856581, 0.841828, 0.871410, 0.855158, 0.832692, 0.862063, 0.856643, 0.828333, ]
16x R_aw_3a51 - 0.858996 (95% [0.850141, 0.867850]):
10884140.arien.ics.muni.cz.R_aw_3a51 etc.
[0.842454, 0.870359, 0.864359, 0.873948, 0.870009, 0.808864, 0.867179, 0.856659, 0.865476, 0.881802, 0.874872, 0.852949, 0.849231, 0.853333, 0.861795, 0.850641, ]
TODO final evaluation on 3rnn to confirm that this doesn't reduce early stopping overfitting or such
Model | trainAllMRR | devMRR | testMAP | testMRR | settings |
---|---|---|---|---|---|
rnn | 0.459869 | 0.429780 | 0.228869 | 0.341706 | (defaults) |
±0.035981 | ±0.015609 | ±0.005554 | ±0.010643 |
8x R_ay_3rnn - 0.419903 (95% [0.399927, 0.439880]):
10884108.arien.ics.muni.cz.R_ay_3rnn etc.
[0.416091, 0.411091, 0.401161, 0.422062, 0.461068, 0.452422, 0.383907, 0.411424, ]
Model | trainAllMRR | devMRR | testMAP | testMRR | settings |
---|---|---|---|---|---|
rnn | 0.460984 | 0.382949 | 0.262463 | 0.381298 | (defaults) |
±0.023715 | ±0.006451 | ±0.002641 | ±0.007643 | ||
attn1511 | 0.445635 | 0.408495 | 0.288100 | 0.430892 | (defaults) |
±0.056352 | ±0.008744 | ±0.005601 | ±0.017858 |
4x R_al_3rnn - 0.395602 (95% [0.383595, 0.407609]):
10884135.arien.ics.muni.cz.R_al_3rnn etc.
[0.408591, 0.391750, 0.392145, 0.389923, ]
4x R_al_3a51 - 0.404151 (95% [0.382397, 0.425904]):
10884137.arien.ics.muni.cz.R_al_3a51 etc.
[0.381521, 0.412896, 0.405499, 0.416687, ]
| avg | 0.619594 | 0.790865 | 0.603921 | 0.623905 | 0.793301 | 0.608480 | | |±0.001845 |±0.002024 |±0.002699 |±0.001948 |±0.002199 |±0.002341 |
avg with new vocabulary:
data/anssel/ubuntu/v2-valset.pickle MRR: 0.621408
data/anssel/ubuntu/v2-valset.pickle 2-R@1: 0.791616
data/anssel/ubuntu/v2-valset.pickle 10-R@1: 0.467536 10-R@2: 0.607055 10-R@5: 0.835020
data/anssel/ubuntu/v2-valset.pickle MRR: 0.620516
data/anssel/ubuntu/v2-valset.pickle 2-R@1: 0.790133
data/anssel/ubuntu/v2-valset.pickle 10-R@1: 0.467229 10-R@2: 0.604959 10-R@5: 0.833691
baseline 0.812771 ±0.006366
16x R_pm_3rnn - F-Score 0.800006 (95% [0.788322, 0.811690]):
10887075.arien.ics.muni.cz.R_pm_3rnn etc.
[0.803665, 0.823684, 0.798962, 0.817150, 0.821687, 0.821516, 0.811705, 0.805556, 0.822023, 0.801068, 0.751753, 0.795306, 0.783821, 0.812169, 0.768817, 0.761216, ]
baseline:
| rnn | 0.732334 | 0.684798 | 0.663615 | 0.663615 | (defaults) | |±0.035202 |±0.016028 |±0.022356 |
16x R_si_3rnn - 0.670247 (95% [0.648247, 0.692246]):
10887077.arien.ics.muni.cz.R_si_3rnn etc.
[0.682193, 0.676602, 0.676918, 0.533230, 0.690744, 0.678811, 0.684401, 0.655812, 0.682989, 0.701587, 0.666743, 0.675061, 0.699224, 0.694073, 0.614459, 0.711100, ]