Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RelaxedWordMoversDistance resuts are not symmetrical #343

Open
oguzozbay opened this issue May 15, 2023 · 0 comments
Open

RelaxedWordMoversDistance resuts are not symmetrical #343

oguzozbay opened this issue May 15, 2023 · 0 comments

Comments

@oguzozbay
Copy link

oguzozbay commented May 15, 2023

I need to calculate similarities of article titles and I intended to use Relaxed Word Mover's Distance.
I will use RelaxedWordMoversDistance() function of text2vec R package.
After some trial, in my output matix which is showing similarities of titles,
I see that RMWD values were not symmetrical.

As I was skeptical of the result I got using my own data, I also tested the example in the vignette.
I checked the example in the below adress.
https://search.r-project.org/CRAN/refmans/text2vec/html/00Index.html

I checked the example of RelaxedWordMoversDistance function in text2vec is an R package vignette.
Then modified example ode and create a larger rwms matrix as follows.
rwms = rwmd_model$sim2(dtm)

The diagonals of the matrix are 1.
But the elements that are symmetrical with respect to the diagonal are not equal to each other.

Say that i and j are titles.
RelaxedWordMoversDistance[i,j] is not equal to  RelaxedWordMoversDistance[j,i]
Is this difference normal or am I doing something wrong?
If you can help I would be grateful.

Below is coppied from Vignette: "Package ‘text2vec’ November 30, 2022"
Example

Not run:

library(text2vec)
library(rsparse)
data("movie_review")
tokens = word_tokenizer(tolower(movie_review$review))
v = create_vocabulary(itoken(tokens))
v = prune_vocabulary(v, term_count_min = 5, doc_proportion_max = 0.5)
it = itoken(tokens)
vectorizer = vocab_vectorizer(v)
similarities 29
dtm = create_dtm(it, vectorizer)
tcm = create_tcm(it, vectorizer, skip_grams_window = 5)
glove_model = GloVe$new(rank = 50, x_max = 10)
wv = glove_model$fit_transform(tcm, n_iter = 5)
wv = wv + t(glove_model$components)

rwmd_model = RelaxedWordMoversDistance$new(dtm, wv)
rwms = rwmd_model$sim2(dtm[1:10, ])
head(sort(rwms[1, ], decreasing = T))

End(Not run)

@oguzozbay oguzozbay changed the title RelaxedWordMoversDistance resuts ara not simetrical RelaxedWordMoversDistance resuts are not symmetrical May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant