Implement SANSA model (scalable variant of EASE) #661

matospiso · 2025-02-15T20:36:06Z

Description

Implement a scalable (approximate) variant of EASE: SANSA
Paper: https://dl.acm.org/doi/10.1145/3604915.3608827

Related Issues

#654

Checklist:

I have added tests.
I have updated the documentation accordingly.
I have updated README.md (if you are adding a new model).
I have updated examples/README.md (if you are adding a new example).
I have updated datasets/README.md (if you are adding a new dataset).

matospiso · 2025-02-15T20:37:42Z

I have updated the documentation accordingly.

Not sure about this point.

tqtg · 2025-02-15T23:29:57Z

I have updated the documentation accordingly.

Not sure about this point.

Ah, we should've removed this one. Previously, we need to add a model entry for documentation manually. We have now automated the parsing from README.md file.

tqtg · 2025-02-15T23:31:47Z

Thanks @matospiso for this PR. Do we have a quick comparison between this version and the original version of EASE?

cornac/models/sansa/recom_sansa.py

tqtg · 2025-02-15T23:40:40Z

cornac/models/sansa/recom_sansa.py

+        seed=None,
+        W1=None,  # "weights[0] (sp.csr_matrix)"
+        W2=None,  # "weights[1] (sp.csr_matrix)"
+        X=None,  # user-item interaction matrix (sp.csr_matrix)


We shouldn't have X here. Training data should have passed in fit function via train_set.

sure - but how does inference work for validation/test users under strong generalization in this framework?
I copied the score function from EASE, but it works with user_idx instead of interaction vectors as I would expect.
EASE (and other similar models) can infer scores from interaction vectors even for users not seen during training.

I got that the model needs interactions to make prediction. In Cornac, it will go through an Experiment with train/val/test sets available. In this event, you've already stored the interactions X during the call of fit, and it will be reused in the score function. For unknown/cold-start users, we do need a mechanism to output scores, e.g., we check that as in here. For the case of EASE, I think we output the default item values in B for unknown users?

I thought the validation/test evaluation was done in a way that users are disjoint between splits, and that's why I added the X - so that we can replace it after training with interactions of validation/test users (keeping the weights) and use these new rows as inference inputs.

I assumed that's why EASE also passes the user-item interaction matrix as an optional argument to __init__:

def __init__( self, name="EASEᴿ", lamb=500, posB=True, trainable=True, verbose=True, seed=None, B=None, U=None, ): Recommender.__init__(self, name=name, trainable=trainable, verbose=verbose) self.lamb = lamb self.posB = posB self.verbose = verbose self.seed = seed self.B = B self.U = U # this is the same as X in SANSA

(see recom_ease.py ).

So I pretty much just copied the interface of EASE in my implementation, but we can change it if you think it's better to remove the X.

I think the confusion is that we're using RatioSplit which is referred to as weak generalization in the EASE paper. The author evaluated the models based on strong generalization scheme which is similar to user-based StratifiedSplit. For the purpose of introducing SANSA model into Cornac, this implementation is fine. However, if we really insist on replicating the author's implementation, we need to further work on the correct evaluation method. I'm happy to merge this PR for now.

yes, that's exactly why I was confused - I thought the EASE implementation already supports StratifiedSplit using this "hack"

It should be pretty easy to add a proper support for this split method to both EASE and SANSA in the future :)

cornac/models/sansa/recom_sansa.py

matospiso · 2025-02-16T12:29:11Z

Do we have a quick comparison between this version and the original version of EASE?

Do you mean in some specific sense? Accuracy, or resource consumption?

tqtg · 2025-02-17T23:13:59Z

Do we have a quick comparison between this version and the original version of EASE?

Do you mean in some specific sense? Accuracy, or resource consumption?

I mean you've already created examples to run the model. Can we compare recommendation performance of EASE and SANSA on the MovieLens dataset? Just put the numbers here so we know that they're not very far off :)

matospiso · 2025-02-18T09:57:45Z

Comparison EASE vs SANSA on ML-1M:
python ease_movielens.py

             | NDCG@100 | Recall@20 | Recall@50 | Train (s) | Test (s)
------------ + -------- + --------- + --------- + --------- + --------
EASEᴿ (B>0)  |   0.3742 |    0.2189 |    0.3440 |    1.0944 |   1.9279
EASEᴿ (B>-∞) |   0.4037 |    0.2394 |    0.3740 |    1.0214 |   1.9590

python sansa_movielens.py (weight_matrix_density=1e-2)

                | NDCG@100 | Recall@20 | Recall@50 | Train (s) | Test (s)
--------------- + -------- + --------- + --------- + --------- + --------
SANSA (CHOLMOD) |   0.4001 |    0.2361 |    0.3730 |   14.3435 |   2.9690
SANSA (ICF)     |   0.3576 |    0.1901 |    0.3300 |    8.5267 |   2.4373

matospiso added 3 commits February 15, 2025 21:14

Implement SANSA model

f38aa86

update examples/README.md

24a78a1

update README (add SANSA model)

769804b

tqtg reviewed Feb 15, 2025

View reviewed changes

cornac/models/sansa/recom_sansa.py Outdated Show resolved Hide resolved

tqtg reviewed Feb 15, 2025

View reviewed changes

cornac/models/sansa/recom_sansa.py Outdated Show resolved Hide resolved

tqtg reviewed Feb 15, 2025

View reviewed changes

cornac/models/sansa/recom_sansa.py Outdated Show resolved Hide resolved

refactor

3741b9f

tqtg approved these changes Feb 18, 2025

View reviewed changes

tqtg merged commit ed9ed07 into PreferredAI:master Feb 18, 2025
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement SANSA model (scalable variant of EASE) #661

Implement SANSA model (scalable variant of EASE) #661

matospiso commented Feb 15, 2025

matospiso commented Feb 15, 2025

tqtg commented Feb 15, 2025

tqtg commented Feb 15, 2025

tqtg Feb 15, 2025

matospiso Feb 16, 2025

tqtg Feb 17, 2025

matospiso Feb 18, 2025

tqtg Feb 18, 2025

matospiso Feb 19, 2025 •

edited

Loading

matospiso commented Feb 16, 2025

tqtg commented Feb 17, 2025

matospiso commented Feb 18, 2025

Implement SANSA model (scalable variant of EASE) #661

Implement SANSA model (scalable variant of EASE) #661

Conversation

matospiso commented Feb 15, 2025

Description

Related Issues

Checklist:

matospiso commented Feb 15, 2025

tqtg commented Feb 15, 2025

tqtg commented Feb 15, 2025

tqtg Feb 15, 2025

Choose a reason for hiding this comment

matospiso Feb 16, 2025

Choose a reason for hiding this comment

tqtg Feb 17, 2025

Choose a reason for hiding this comment

matospiso Feb 18, 2025

Choose a reason for hiding this comment

tqtg Feb 18, 2025

Choose a reason for hiding this comment

matospiso Feb 19, 2025 • edited Loading

Choose a reason for hiding this comment

matospiso commented Feb 16, 2025

tqtg commented Feb 17, 2025

matospiso commented Feb 18, 2025

matospiso Feb 19, 2025 •

edited

Loading