Output problems #7

Sergio-Dobrianskiy · 2018-04-01T15:12:19Z

I’ve tried running example.py and I’ve found a couple of problems with the results:

In the output file the vectors didn’t have the word but just the coordinates
The vectors in the input file “text8_word2vec_50_5_100.csv” are 625, they become 499
in the input file the values of the coordinats seem too high, here’s an example:

import numpy as np
x = np.load('parsimax_rotated.npy')
print(x)

I risultati sono: 
[[  6.67499123e-09   1.69933674e-06   4.00578442e-13 ...,   1.49247162e-21    1.01390868e-10   1.09191049e-07]
[  2.53477170e-11   1.07793809e-10   4.00463625e-13 ...,   6.41054397e-12    6.33692187e-12   4.34426653e-07]
 [  1.03132579e-10   4.34370634e-07   1.01390750e-10 ...,   6.29827831e-12
   6.63831878e-09   1.54319520e-21]
..., 
 [  2.71470011e-08   6.67600997e-09   2.59566146e-08 ...,   4.24852601e-07
    9.10764858e-14   6.78674938e-09]
 [  1.66118798e-06   2.65169917e-11   2.51931271e-11 ...,   1.62228841e-09
    1.69948441e-06   4.00463706e-13]
 [  2.66999702e-08   2.69490090e-11   4.05617567e-10 ...,   9.10764519e-14
    2.69472968e-11   1.68979875e-06]]

I’ve also exported created a .csv file of the output and the results are the same

I've made a heatmap of the output that looks like this.

Thank you for your support in advance

The text was updated successfully, but these errors were encountered:

SungjoonPark · 2018-05-04T13:21:35Z

The code just computes rotated matrix of the given input, so you can treat output vectors as rotated input vectors. In other words, the order of words will be preserved.
The number of words should be the same. @NoSyu Could you take a look at the code if there are some errors?
I could not find that high values in the file. Did you used the vectors as inputs? If you apply rotation to any embedding, normalization over whole matrix (re-scaling all values from 0 to 1) will help since this method is usually applied to correlation matrices.

Sergio-Dobrianskiy · 2018-05-09T20:47:05Z

I've used the text8_word2vec_50_5_100.csv file as input and run the program. I've tried this several times (also with my teacher) and on different computers and the results are the same.

NoSyu · 2018-06-04T09:34:46Z

There is a bug to load word2vec using gensim.
https://radimrehurek.com/gensim/models/keyedvectors.html
I've fixed it and update it.

Fix the bug from #7

NoSyu added a commit that referenced this issue Jun 4, 2018

Update example.py

597fc66

Fix the bug from #7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output problems #7

Output problems #7

Sergio-Dobrianskiy commented Apr 1, 2018

SungjoonPark commented May 4, 2018

Sergio-Dobrianskiy commented May 9, 2018

NoSyu commented Jun 4, 2018

Output problems #7

Output problems #7

Comments

Sergio-Dobrianskiy commented Apr 1, 2018

SungjoonPark commented May 4, 2018

Sergio-Dobrianskiy commented May 9, 2018

NoSyu commented Jun 4, 2018