Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
See #424, a user who lands on a mod page from off-site has minimal connectivity to other SpaceDock pages. We might have several more mods they'd enjoy, but to find them the user would have to click the header to view an overall list or perform a text search that's currently hard to use. Many web sites famously have "related" links that try to show the user items similar to the current item to help them explore.
Changes
Now the bottom of the mod page hosts a modestly named "Similar-ish Mods" list containing up to 6 mods that are similar to the current mod:
(I decided against using the term "related" because to me, "related" mods would be from the same family of mods like Near Future, or would be designed specifically to extend or work with one another. Rather than such a close relationship, we are just talking about mods that might be about some of the same things, like planet packs or parts.)
To get this list, the mod page template inspects
Mod.similar_mods
, which is an association proxy based onMod.similarities
, which is a one-to-many relationship to a newModSimilarity
(mod_similarity
) table:main_mod_id
Mod.id
of one of the mods being comparedother_mod_id
Mod.id
of the other mod being comparedsimilarity
An index of
(main_mod_id, other_mod_id)
allows quick access to known pairs of mods, and an index of(main_mod_id, similarity.desc())
ensures thatMod.similarities
can be generated quickly.ModSimilarity.similarity
(a single precision float because the values will all be between 0 and 4) is calculated inModSimilarity
's constructor by summing the similarities of the authors (divided by 10 to prevent wildly different mods by the same author from dominating mods that share meaningful keywords, call it "the @linuxgurugamer factor"), name, short description, and description of the two mods, which in turn are calculated as:... where A and B are the words from each string. For authors, each author name is treated as a word, but for the other strings:
All routes that modify these inputs are updated to trigger a recalculation of the affected mod's similar mods in the background via a new Celery task. For this to work, we now also
db.commit()
before those calls. The same Celery task also populates the initial values in themod_similarity
table for all existing mods in the migration.The Celery task works by iterating over all mods other than the given one published for the same game and comparing them to the given mod, keeping only the 6 pairs with the highest similarity. Pairs with 0 similarity are never included. Once the most similar pairs are known, the list in
Mod.similarities
is updated to match. According to https://spacedock.info/api/browse there are 2252 mods on SpaceDock right now, so if each has 6 rows inmod_similarities
, that would be 13512 new rows.To reduce redundant work during repeated similarity calculations for the same mod,
Mod._author_names
andMod._words
now cache the words in the input properties on-demand viaMod.get_author_names()
andMod.get_words()
.To make it easier to experiment with production's large, authentic data set, I have included a utility in
tests/test_mod_similarity.py
that displays the results of the same algorithm applied to data from the production server's API. It's not a test, but I found it very useful for development, and putting it with the tests seemed better than putting it with production code or on a wiki.I have tried to isolate the similarity logic to the
similarity.py
andstr_similiarity.py
modules in case we find a machine learning package that we want to use later; switching over should be possible as long as we can write replacements forModSimiliarity.__init__
,words_similarity
andmeaningful_words
.Fixes #424.