Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

StarSpace selection of positive/negative example, and usage for multiple types of items #276

Open
nirlotan opened this issue Oct 8, 2019 · 1 comment

Comments

@nirlotan
Copy link

nirlotan commented Oct 8, 2019

Hi,

As part of a research in the context of collaborative filtering preformed by a group of researches in Haifa University, we've been trying to use your StarSpace framework in order to benchmark CF results for recommendations of different types of items, and refer to your paper on this topic.

We've been using StarSpace training mode = 1, and have a couple of questions that we will highly appreciate if you can answer and help us understand.

  1.  What is the method you are using for generating positive and negative examples given an input file? I've added some traces into the code, and I do see that you select the examples randomly, but cannot detect a pattern. a. For example, given an input line with items A1, A2, A3, A4, A5, - would you compare each item with the rest of the items in the line? for example - for A1 - would you compare it with each of the remaining items? ({A1,A2}, {A1,A3}, {A1,A4},{A1,15})? From what I see in the code this is not necessarily the case, and you randomly select pairs for each epoch. is that correct?b. How do you select the negative examples? do you randomly select them from all items that are excluded from the input line? Again based on my traces I saw that there is random selection, but the dictionary from which you select is not clear to me. Also - is it possible that you select from the list of items in the line also negative example (which shouldn't be the case)?

  2. Next we would like to train the model to work with different types of items, and infer only on one of those types. It wasn't clear to me from the documentation if it is enough to use a different prefix for the items, or should we do anything else? For example, is it enough to provide the items in this format: A1, A2, A3 ..., B1, B2, B3.... C1, C2, C3... to designate three types of items (type A, type B, type C), and then try to infer on items from type A alone? I'm asking because when doing so - I got much lower accuracy rates, which didn't make sense to me. Should we continue to use training mode 1 for this case, or should we switch to a different training mode.

If you have reached so far - I want to thank you for reading this long message, and your willingness to support college researchers. We are looking forward to using your framework and referring to it in our research. Once completed I will also be happy to contribute the wrapping framework which we have created in order to run multiple StarSpace experiments using python.

Thanks again!
Nir.

@baiduzhaozhuo
Copy link

the first problem can be explained from the source code and paper. for example, there are three samples, [(A1, A2, A3, A4,A5), (B1, B2, B3, B4, B5), (C1, C2, C3,C4)] which can be described as user click sequences. it splits each sample into two parts, the RHS( right hand side) and LHS( left hand side). in each sample, RHS can be regarded as label, which is seleceted randomly from the LHS, and the left items as LHS. so we get the three sampels as follows: [(A1, A2, A4, A5):A3, (B1, B3, B4, B5): B2, (C2, C3,C4):C1]. next, it makes sum(LHS) as 'a', RHS as 'b+', so , (a, b+) as positive sample pair. 'b-' for each of k is selected randomly from the total set of RHS, so , (a, b-) is one negative sample pair. at last, it run formation L(sim(a, b+), sim(a, b-) ...) as loss function。hope to help you。

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants