Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v5.2.2] Significantly optimize performance for fetching leaderboards #642

Merged
merged 3 commits into from
Feb 29, 2024

Conversation

cmyui
Copy link
Member

@cmyui cmyui commented Feb 28, 2024

Describe your changes

Related Issues / Projects

Checklist

  • I've manually tested my code


# v5.2.1
create index scores_fetch_leaderboard_generic_index
on scores (map_md5, status, mode);
Copy link
Member Author

@cmyui cmyui Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From some testing on a map with 1,000,000 scores, this is a measurement of the main query used to fetch in-game global leaderboards via the GET /web/osu-osz2-getscores.php endpoint:

query = [
f"SELECT s.id, s.{scoring_metric} AS _score, "
"s.max_combo, s.n50, s.n100, s.n300, "
"s.nmiss, s.nkatu, s.ngeki, s.perfect, s.mods, "
"UNIX_TIMESTAMP(s.play_time) time, u.id userid, "
"COALESCE(CONCAT('[', c.tag, '] ', u.name), u.name) AS name "
"FROM scores s "
"INNER JOIN users u ON u.id = s.userid "
"LEFT JOIN clans c ON c.id = u.clan_id "
"WHERE s.map_md5 = :map_md5 AND s.status = 2 " # 2: =best score
"AND (u.priv & 1 OR u.id = :user_id) AND mode = :mode",
]
params: dict[str, Any] = {
"map_md5": map_md5,
"user_id": player.id,
"mode": mode,
}
if leaderboard_type == LeaderboardType.Mods:
query.append("AND s.mods = :mods")
params["mods"] = mods
elif leaderboard_type == LeaderboardType.Friends:
query.append("AND s.userid IN :friends")
params["friends"] = player.friends | {player.id}
elif leaderboard_type == LeaderboardType.Country:
query.append("AND u.country = :country")
params["country"] = player.geoloc["country"]["acronym"]
# TODO: customizability of the number of scores
query.append("ORDER BY _score DESC LIMIT 50")
score_rows = await app.state.services.database.fetch_all(
" ".join(query),
params,
)

Previously, it took about 18 seconds -- now takes 0.01.

mysql> explain analyze SELECT s.id, s.score AS _score, s.max_combo, s.n50, s.n100, s.n300, s.nmiss, s.nkatu, s.ngeki, s.perfect, s.mods, UNIX_TIMESTAMP(s.play_time) time, u.id userid, COALESCE(CONCAT('[', c.tag, '] ', u
.name), u.name) AS name FROM scores s INNER JOIN users u ON u.id = s.userid LEFT JOIN clans c ON c.id = u.clan_id WHERE s.map_md5 = '1cf5b2c2edfafd055536d2cefcb89c0e' AND s.status = 2 AND (u.priv & 1 OR u.id = 3) AND mode = 0 ORDER BY _score DESC LIMIT 50;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Limit: 50 row(s)  (cost=90179 rows=12.5) (actual time=0.435..0.668 rows=50 loops=1)
    -> Nested loop left join  (cost=90179 rows=12.5) (actual time=0.434..0.663 rows=50 loops=1)
        -> Nested loop inner join  (cost=59249 rows=12.5) (actual time=0.427..0.638 rows=50 loops=1)
            -> Filter: ((s.`mode` = 0) and (s.`status` = 2) and (s.map_md5 = '1cf5b2c2edfafd055536d2cefcb89c0e'))  (cost=2.2 rows=12.5) (actual time=0.411..0.469 rows=50 loops=1)
                -> Index scan on s using scores_score_index (reverse)  (cost=2.2 rows=100) (actual time=0.405..0.43 rows=50 loops=1)
            -> Filter: ((0 <> (u.priv & 1)) or (s.userid = 3))  (cost=0.479 rows=1) (actual time=0.00303..0.00314 rows=1 loops=50)
                -> Single-row index lookup on u using PRIMARY (id=s.userid)  (cost=0.479 rows=1) (actual time=0.00266..0.00269 rows=1 loops=50)
        -> Single-row index lookup on c using PRIMARY (id=u.clan_id)  (cost=0.25 rows=1) (actual time=314e-6..314e-6 rows=0 loops=50)
 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

This intentionally does not include score or pp -- it's meant as a generic index to help with all leaderboard fetching. From testing, it seems to work well for global, country, friends, and mod-specific leaderboards.

Copy link
Member Author

@cmyui cmyui Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the addition of this index, the performance bottleneck on fetching leaderboards & submitting scores are now both related to determining the rank of the score. The query is as follows:

SELECT COUNT(*) AS c
FROM scores s
INNER JOIN users u ON u.id = s.userid
WHERE s.map_md5 = :map_md5
AND s.mode = :mode
AND s.status = 2
AND u.priv & 1
AND s.score > :score

On the same 1,000,000 score beatmap, fetching the rank takes about 5.7 seconds. Even with indexes applied. I suspect the best plan of attack here will be to move the u.priv data into the scores table -- i.e. a new column (or modification of the status column) to accomodate for whether the score should be ranked in the sense of the player being publicly visible.

cc @tsunyoku as I believe you noticed a similar concept on the official osu! servers 👀 -- curious, do you have an API proposal on the matter?

Copy link
Contributor

@tsunyoku tsunyoku Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those 2 concepts are not quite the same - scores being marked as ranked on osu! doesn't care about the user's privileges. for leaderboards they're using elasticsearch, not mysql.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kk, I'll come up with a proposal for how we might do this -- I think we recently had a similar realization on akatsuki with regard to splitting user data out from score data (to avoid the cross-table joins and provide complete db isolation)

Copy link
Member Author

@cmyui cmyui Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my findings/design might be reusable for Akatsuki

@cmyui cmyui marked this pull request as ready for review February 28, 2024 01:01
migrations/migrations.sql Outdated Show resolved Hide resolved
@cmyui cmyui changed the title Significantly optimize performance for fetching leaderboards [v5.2.2] Significantly optimize performance for fetching leaderboards Feb 28, 2024
@cmyui cmyui self-assigned this Feb 28, 2024
@cmyui cmyui added the performance Improvements to resource usage without changing functionality label Feb 28, 2024
@cmyui cmyui merged commit 23b1124 into master Feb 29, 2024
5 checks passed
@cmyui cmyui deleted the fetch-leaderboard-generic-index branch February 29, 2024 01:25
cmyui added a commit that referenced this pull request May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Improvements to resource usage without changing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants