-
-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Show similar mods on mod page #424
Comments
Specific idea for how to calculate the author comparison for author lists A and B: This would have the desired properties that author lists with no intersection would be assigned a value of 0, and author lists that match completely would be assigned 1. Two 2-author mods with 1 in common would be assigned ⅓ (one in both divided by three total). If both of those mods add the same new author, it would become 0.5 (two in both divided by four total). This could also be adapted as a very simple algorithm to compare description strings, substituting words for authors and dropping a list of known meaning-free words like "the" and "a". This would not handle synonyms, but maybe mod authors use identical words often enough in practice for that to not matter. Might have to try it to find out. |
Playing with this a bit, long descriptions tend to use a lot of words that don't convey meaning, and a lot of the "meaning" isn't describing the mod (e.g., installation instructions, maintainer history, etc.). Comparing a shorter description with a longer description looks close to hopeless due to the disparate number of "extra" words used to make similar points. A variant would be to count matches as worth the word's length instead of 1, so longer words are worth more than shorter words, on the assumption that these are likely to be more meaningful. Unfortunately this seems to make the similarities of similar mods even lower (thanks to all those long non-matching words).
|
Dialing back the ambition significantly, maybe we should settle for:
|
My prototype is shaping up, this might work. A few more notes:
|
Motivation
Currently users can only find mods based on the featured list, creation/update time, overall popularity, and (a currently rather poor) text search. These features are only available via the mod listing pages specifically made for it. If a user happens to open a mod page from off-site, there is no easy path to finding more mods they might like.
Suggestion
We could add a Similar Mods list at the bottom of the mod page that would show a few (6? 12? unlimited paginated?) mods ranked by how similar they are to the main mod. Visually, it should be a pretty simple matter to re-use the existing mod box styling and functionality, kind of like:
Data model
I imagine implementing this with a new
ModsSimilarity
table to store similarities:main_mod_id
Mod.id
of one of the mods being comparedother_mod_id
Mod.id
of the other mod being comparedsimilarity
An index of
(main_mod_id, similarity DESC)
would allow us to quickly get the mods most similar to a given mod fromother_mod_id
of the rows returned. We would have to create two rows per pair of mods under this model, with the id values swapped in the two*_mod_id
columns, but I think that may be the least bad approach anyway.With 2913 mods currently in the db (counting deleted ones because I don't have an easy way to exclude them), there would be 8,485,569 rows in the table.
Calculating similarity values
We would probably base the similarity on a weighted sum of comparisons of these columns:
Mod.game_id
- 1 if same, 0 if different[Mod.user_id, SharedAuthor.user_id]
(the authors) - 1 if all authors are same, 0 if all authors are different, fractions for partial matchesMod.name
Mod.short_description
Mod.description
Mod.default_version.changelog
(most recent changelog, maybe)Mod.background
(image files, maybe)Ideally we would delegate the comparison of the string columns to a machine learning library with an interface like:
There are many such open source libraries, including for Python, but so far I have not found one that would make it that easy. They generally would require us to:
So rather than having "an AI" do the hard work for us, we would have to tell it that "probe" and "satellite" are similar but "future" and "SPH" are not, etc., and then micromanage its memory for it and fiddle with it until its comparisons looked acceptable. At that point we might be better off writing our own simpler ad hoc heuristic logic.
It would be nice if we could detect when the user clicks a similar mod link and use that to update the comparison of the mods, since in that case a human is confirming the similarity. I'm not sure how we would do that.
Batching the calculations
To get started, we would need to compare every mod with every other mod (O(N²) in the number of mods). Then as mods were created and edited and updated, we would have to re-compare the changed mod with all the other mods (O(N)). This probably isn't something we could run in the foreground on any page. Ideally we would add mods that need re-comparison to a queue and then have a background task perform the comparisons and update the db.
The text was updated successfully, but these errors were encountered: