-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nonzero elements are always in same dimensions #8
Comments
Hi, could you please provide some more information : 1) What are 'some embeddings' ? Are they standard word2vec or glove embeddings, or are they from some other source ?2) What are words in the image you have attached ?3) On what data did you train the SPINE model ? - e.g. did you use SPINE model trained on word2vec? |
Actually, they are not some word embeddings. They are RepBert representation of some MSMArco documents. In other words, they are fine-tuned versions of BERT, for some passages. |
Got it, thanks for clarifying. If I understood it correctly, you train the model on RepBert embeddings of MSMarco documents. Did you play around with hyperparameters ? The hyperparameter values in this codebase are suggested settings for GloVe and word2vec embeddings, and would probably need some adjustment when you try on other word embeddings. |
Yes, exactly. |
Hi @Narabzad, thanks for your interest. If your starting representations are very "similar" to each other, it might happen that the resulting embeddings are also high/low in the same dimensions (in the extreme case, if all the starting representations are exactly the same for each document, then you would result in such a case). One simple thing to try here might be to have a very high coefficient on the reconstruction loss, so even small reconstruction errors are penalized. If your resulting embeddings look similar (like in the picture you shared), you wouldn't be able to reconstruct perfectly, so penalizing heavily on reconstruction loss might prevent this behaviour. Try this with alongside a high coefficient for the PSL loss (so that the values are pushed towards 0 and 1). |
not all of the documents are very similar, but I can say each group of documents might be highly similar since they have been retrieved for a specific query. I will try to increase the coefficient for PSL loss. I will keep you posted on this. I highly appreciate your time and effort. |
I applied SPINE on top of some embedding and I got sparse embeddings but nonzero elements are exactly in the same dimensions always in almost all the samples.
I attached an image to this email that might help to understand the problem better.
Do you have any ideas why this is happening?
The text was updated successfully, but these errors were encountered: