Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output problems #7

Open
Sergio-Dobrianskiy opened this issue Apr 1, 2018 · 3 comments
Open

Output problems #7

Sergio-Dobrianskiy opened this issue Apr 1, 2018 · 3 comments

Comments

@Sergio-Dobrianskiy
Copy link

I’ve tried running example.py and I’ve found a couple of problems with the results:

  • In the output file the vectors didn’t have the word but just the coordinates

  • The vectors in the input file “text8_word2vec_50_5_100.csv” are 625, they become 499

  • in the input file the values of the coordinats seem too high, here’s an example:

import numpy as np
x = np.load('parsimax_rotated.npy')
print(x)

I risultati sono: 
[[  6.67499123e-09   1.69933674e-06   4.00578442e-13 ...,   1.49247162e-21    1.01390868e-10   1.09191049e-07]
[  2.53477170e-11   1.07793809e-10   4.00463625e-13 ...,   6.41054397e-12    6.33692187e-12   4.34426653e-07]
 [  1.03132579e-10   4.34370634e-07   1.01390750e-10 ...,   6.29827831e-12
   6.63831878e-09   1.54319520e-21]
..., 
 [  2.71470011e-08   6.67600997e-09   2.59566146e-08 ...,   4.24852601e-07
    9.10764858e-14   6.78674938e-09]
 [  1.66118798e-06   2.65169917e-11   2.51931271e-11 ...,   1.62228841e-09
    1.69948441e-06   4.00463706e-13]
 [  2.66999702e-08   2.69490090e-11   4.05617567e-10 ...,   9.10764519e-14
    2.69472968e-11   1.68979875e-06]]

I’ve also exported created a .csv file of the output and the results are the same

I've made a heatmap of the output that looks like this.
hm

Thank you for your support in advance

@SungjoonPark
Copy link
Owner

  1. The code just computes rotated matrix of the given input, so you can treat output vectors as rotated input vectors. In other words, the order of words will be preserved.
  2. The number of words should be the same. @NoSyu Could you take a look at the code if there are some errors?
  3. I could not find that high values in the file. Did you used the vectors as inputs? If you apply rotation to any embedding, normalization over whole matrix (re-scaling all values from 0 to 1) will help since this method is usually applied to correlation matrices.

@Sergio-Dobrianskiy
Copy link
Author

  1. I've used the text8_word2vec_50_5_100.csv file as input and run the program. I've tried this several times (also with my teacher) and on different computers and the results are the same.

@NoSyu
Copy link
Collaborator

NoSyu commented Jun 4, 2018

  1. There is a bug to load word2vec using gensim.
    https://radimrehurek.com/gensim/models/keyedvectors.html
    I've fixed it and update it.

NoSyu added a commit that referenced this issue Jun 4, 2018
Fix the bug from #7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants