You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@deweihu96
Thank you for your interest in our work.
It seems the two variables you mentioned are conceptually the same (i.e. how much of surrounding words/nodes to use) but just different in terms of package implementation.
In gensim, there is a multiplication by 2 probably because the implementation directly extracts words before and after the target word.
On the other hand, in our work, which follows the pytorch-geometric package's node2vec implementation, each random walk is splitted into subsequences of length = context_size, and then for each subsequence, the initial node pairs up with all the remaining nodes to form a positive example.
For instance, if you set the random walk length as 8 and context_size as 3, then for a random walk [n1, n2, ... , n8], we have:
Hi, thanks for your great work. I've been trying to incorporate the augmentation code into my own workflow.
I'm a little bit confused by the "context_size" variable. Is it the same as the "window_size" variable in gensim word2vec?
The actual sequences in gensim word2vec used for training is 2*window_size - 1.
But the code in sampler.argew suggests that the length of sequences used for training is "context_size" : sequences = rw[:, j:j + self.context_size].
The text was updated successfully, but these errors were encountered: