-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to add user and/or item features #159
base: user_item_features
Are you sure you want to change the base?
Option to add user and/or item features #159
Conversation
dumping is now done with pickle 'highest protocol'
added asym_rmse and asym_mae
Thanks for the PR! Coming up with a way to handle content-based feature is a lot of work. I gave it a quick try before, and found it not to be worth the hassle. I'm not saying it's not doable, it may even be quite easy if you're going for a very specifc solution, but integrating the features in the whole data pipeline (cross validation etc.) in a generic fashion can be tricky. It also require to make choice regarding the dataset loading, etc. So I think the best way is probably for you to work on it on our own fork for now, and submit a complete PR once you think it's done (if you still want to). But even then, I can't promise we will be able to merge it: it will depend on how useful this addition can be and how well it integrates with the current codebase. Would that work for you? Thanks! |
…aset Revert "Revert "Features dataset""
@NicolasHug What do you think of my implementation? It appears that some tests need to be modified. |
Lasso prediction algorithm
[GSF] Syncing Fork
Thanks, I really appreciate all the efforts on the clear coding and the good documentation :) ! I have only rapidly checked the code yet, but I'm wondering why you're passing the user / item features to the Could you brief me up a bit on what those user / item features actually are? Meaning, what kind of datasets need such features? What are some examples of algorithms that use those features? Any reference for the Lasso algorithm that you implemented? Are there publically available datasets that we could natively support? For the tests: you'll need to add scikit-learn as a dependency in all the Thanks! |
I am passing user/item features to the These features can consist of a variety of things. For example, user features could consist of demographic information (e.g., age, gender) or other elicited information (e.g., preferences for certain actors or movie genres). The item features can consist of attributes associated with the item (e.g., movie genre or studio) or expert information (e.g., expert rating). Most recommender systems do not have such information but for some applications it is possible to ask for this information or to scrape it off the web (e.g., expert information). Algorithms can be designed to use only user or item features, or both user and item features. These can be hybrid algorithms [1] (i.e., mix of algorithms) or a specific algorithm. The lasso algorithm I have implemented is only a naïve / simplistic implementation of [2]. I did this just to test my implementation of the features add-on. I am now working towards an implementation of factorization machines [3] which is (in my opinion) a much better approach. I plan to implement factorization machines by importing the tffm library. This library appears more complete than polylearn, another library for factorization machines. I am pretty sure that there most exist publicly available datasets that include such features but I am unaware of which ones and I don't have time to look at it for the moment. Maybe one of the Yahoo datasets? As I said, I don't think the [1] R. Burke, “Hybrid Recommender Systems: Survey and Experiments,” User Model. User-adapt. Interact., vol. 12, no. 4, pp. 331–370, 2002. |
Thanks a lot for the update!
Yeah you're absolutely right I was looking at it the wrong way.
Probably best to remove it from there, then. I've been thinking about adding additional dependencies (scikit-learn, tffm or whatever) and I think it's OK as long as we keep them optional. E.g. if you implement FM with tffm, only those that want to use the FM model would need to install tffm, but it's still not a core dependency (concretely, we don't add tffm to |
Ok, so I removed the features from the Also, do we need to correct the test codes? |
Thanks, I can merge this PR into a new feature branch if you want? And you can send more PRs to the new branch. For the test: I suspect you'll have other failed tests when you implement the algorithm, so it's up to you. If you prefer to solve tests issues all at once I'm OK with that. BTW, have you thought of a way to integrate the new changes with the cross validation iterators? |
The features option already works with the cross validation iterators (to my knowledge). |
@NicolasHug How can we enable tests on this base branch so that I can see which tests fail? |
We'd need to modify the |
What is the best way to run the tests locally without having to do |
Just run If you haven't already, check out the contributing guidelines |
I have corrected the tests so that they now work on my computer running |
No worries, it can wait. EDIT: I mean merging into the master branch for a future release. I don't mind merging untested code into a feature branch. |
Hey, any updates on this? I've been following the conversation and would like to know if you guys have any plans to merge this. I am working with context-aware recommender systems and I'm re-writing my code from java to python (which I'm kinda newbie). Is there a way to populate the Dataset with more info then just Keep up with the nice work! 😊 |
I have been working on other projects in the mean time but this branch should work without issues. However, I would recommend using my factorization-machines branch as it should contain the latest updates. However, this branch takes into account user and item features, but not context. Also, by looking at the code, I don't think that the timestamp option is working. To add context with many variables, it would be easy to extend my code to add features on the user-item pairs. Then, you would need to extend the algorithms to take these features into account. |
@martincousi is 100% correct No plan to merge this (or the other branch) unfortunately, because I don't have enough visibility on how well it would integrate with the current code base. |
Hi Martin and Nicolas, |
If you want to add user/item features to a factorization algorithm, you should take a look at factorization machines. I have a working implementation at factorization_machines.py. Note that this the sample_weight branch of my fork. It is the most up-to-date and requires PyTorch in order to use the To use this class, you first add your features using |
I started modifying the Dataset and Trainset classes to include the option of having user and/or item features since I later want to work on an algorithm that accepts these. I think I made good progress but I still need to figure out how to create a testset with these features.
PS. It appears that this branch also included other modifications that I did with respect to asymetric_measures and the cancellation of printing for the computation of baselines and similarities. Edit: I revert back the changes to accuracy.py and AlgoBase.