Skip to content
Stanislaw Jastrzebski edited this page Aug 30, 2017 · 19 revisions

Results of different publicly available embeddings calculated using this script.

  • Rows are sorted by summed ranking for each benchmark.

  • In case word is missing from embedding random vector is used. More principled way would be calculating intersection of vocabularies beforehand

  • Embeddings were trained on different corpuses (however most of them on some version of wikipedia dump with various preprocessing), this page doesn't claim to be any sort of serious benchmark of word embeddings. Please see for instance this paper by O. Levy et al. for a thorough exploratory analysis.

  • There are no good skip-gram or CBOW embeddings available online, so I excluded them from this table for now.

Sources of embeddings:

MEN MTurk RG65 RW SimLex999 WS353 WS353R WS353S Google MSR SemEval2012_2 AP BLESS Battig ESSLI_1a ESSLI_2b ESSLI_2c
LexVec which="commoncrawl-W+C" 0.809 0.712 0.765 0.478 0.419 0.647 0.571 0.756 0.710 0.601 0.187 0.612 0.795 0.438 0.818 0.750 0.667
PDC dim=300 0.773 0.672 0.790 0.455 0.427 0.721 0.641 0.789 0.748 0.596 0.290 0.639 0.805 0.431 0.773 0.725 0.644
HDC dim=300 0.760 0.655 0.806 0.438 0.407 0.677 0.581 0.787 0.731 0.564 0.293 0.632 0.815 0.432 0.773 0.750 0.644
SG GoogleNews (word2vec) 0.741 0.670 0.761 0.471 0.442 0.700 0.635 0.772 0.402 0.712 0.335 0.649 0.795 0.406 0.750 0.800 0.644
PDC dim=100 0.755 0.710 0.774 0.421 0.361 0.690 0.606 0.779 0.704 0.543 0.280 0.632 0.760 0.431 0.727 0.750 0.622
GloVe dim=300 corpus=common-crawl-42B 0.736 0.645 0.817 0.376 0.374 0.553 0.473 0.669 0.750 0.702 0.306 0.622 0.785 0.451 0.795 0.750 0.578
GloVe dim=300 corpus=wiki-6B 0.737 0.633 0.770 0.359 0.371 0.522 0.446 0.653 0.718 0.616 0.280 0.637 0.820 0.410 0.773 0.825 0.644
HDC dim=100 0.738 0.648 0.804 0.388 0.324 0.617 0.523 0.753 0.667 0.497 0.260 0.619 0.825 0.432 0.773 0.750 0.622
GloVe dim=200 corpus=wiki-6B 0.710 0.620 0.713 0.331 0.340 0.489 0.418 0.615 0.698 0.596 0.274 0.634 0.810 0.423 0.773 0.725 0.622
PDC dim=50 0.720 0.700 0.763 0.390 0.309 0.637 0.543 0.741 0.579 0.369 0.241 0.617 0.760 0.426 0.682 0.750 0.556
GloVe dim=100 corpus=wiki-6B 0.681 0.619 0.676 0.310 0.298 0.451 0.380 0.587 0.632 0.551 0.279 0.644 0.780 0.435 0.705 0.750 0.644
HDC dim=50 0.708 0.649 0.723 0.361 0.281 0.575 0.472 0.713 0.534 0.347 0.243 0.555 0.730 0.429 0.705 0.775 0.578
GloVe dim=50 corpus=wiki-6B 0.652 0.619 0.595 0.285 0.265 0.419 0.348 0.554 0.462 0.356 0.251 0.634 0.725 0.391 0.773 0.750 0.600
GloVe dim=200 corpus=twitter-27B 0.594 0.555 0.698 0.197 0.130 0.451 0.373 0.590 0.534 0.503 0.246 0.515 0.690 0.326 0.773 0.700 0.578
NMT which=FR 0.492 0.464 0.590 0.301 0.460 0.488 0.444 0.572 0.212 0.434 0.251 0.420 0.445 0.165 0.568 0.700 0.644
GloVe dim=100 corpus=twitter-27B 0.577 0.559 0.677 0.210 0.122 0.442 0.364 0.592 0.429 0.428 0.250 0.500 0.675 0.315 0.727 0.675 0.600
NMT which=DE 0.492 0.464 0.590 0.301 0.460 0.488 0.444 0.572 0.212 0.434 0.251 0.415 0.445 0.165 0.568 0.700 0.622
GloVe dim=50 corpus=twitter-27B 0.531 0.515 0.574 0.196 0.098 0.392 0.325 0.540 0.260 0.271 0.223 0.458 0.665 0.308 0.705 0.675 0.511
GloVe dim=25 corpus=twitter-27B 0.444 0.481 0.503 0.173 0.073 0.307 0.235 0.458 0.111 0.116 0.209 0.453 0.545 0.267 0.659 0.700 0.489
Clone this wiki locally