Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osu!standard Performance Calc: LengthBonus Removal and Star Rating Summation Change #25126

Closed
wants to merge 15 commits into from

Conversation

Xexxar
Copy link
Contributor

@Xexxar Xexxar commented Oct 14, 2023

Introduction

A while back I PR'd a suggestion to replace the osu!pp score summation method from using a geometric sum to a harmonic/logarithmic sum (found here: ppy/osu-queue-score-statistics#88). Due to recent scores being set, this gave me some inspiration to try a similar summation method in the osu! difficulty calculator to replace the geometric sum there with a similar harmonic/logarithmic sum and I found the results to be quite promising.

Summation Method

A desmos can be seen of the before and after on the curves. This graph is set up with the X-axis being equally rated 400 MS chunks and the y-axis being total difficulty. The flat red curve is the current system which has the limitation of being bounded by nature of being a geometric sum. This causes us to need an alternative way to sum length which is why ppcalc uses the lengthBonus system based off object count. The blue curve is the new proposal which uses a modified harmonic sum based off the index of the summation rather than a multiplier derived from a geometric formula.

https://www.desmos.com/calculator/rnnxsqsybg (updated with new balancing constants)

With this new method, we're able to more accurately assess total map difficulty across the whole map without having to rely on a object count based difficulty multiplier. This is quite a nice way to eliminate a troublesome issue that has been plaguing diffcalc since the beginning.

Effects in Practical Terms

In short, this change would:

  • make maps that have high object count but inconsistent difficulty worth less
  • make maps that are consistently difficulty worth more
  • make maps that are typically aim heavy worth more (as aim maps have less objects than stream maps)
  • make maps that are ringtone size worth less (strains are too few to sum to the same amount as the old lengthBonus allowed)
  • make maps that are lower star rating worth more as they are typically more consistent difficulty
  • make maps that are higher star rating worth less as they are typically less consistent difficulty

Negative Subsequent Effects

  • Maps that are designed around patterns that are sensitive to the algorithm might see unintentional buffs
  • Already mentioned but extremely short maps will see a nerf due to reasons mentioned above
  • Consistency is rewarded but length does still play a significant role. Mid length maps with mid level consistency and a spike may still be buffed in a way that may be seen as negative

Conclusions

This is a small PR that has quite a large effect on systems critical to difficulty assessment. The parameters have been balanced to be as minimally invasive to the overall competitive scene for osu!. Fellow developers are welcome to re-balance these parameters if they feel that I have not done a good job. I won't be linking any score websites or profile samples as those who are interested can simply clone this PR and inspect the results for themselves.

EDIT: Gamers love their pp: https://pp.huismetbenen.nl/rankings/topscores/length-bonus-removal

Thanks for reading,

Xexxar

Code Comments

I probably should have checked this, but the use of an explicit index definition and increment might be faulty if there's a enumerate style system list indexing in C# like there is in python. My C# isn't very good but that was one enhancement that made sense to me if such a thing is possible inside that foreach loop.

Rebalance Actions:

2023-10-16 Update:

  • Readjusted summation parameters based on feedback
  • Removed speedBonus on speedEvaluator (based on spaced streams at high bpm being overweighted)
  • Buffed speed's skillmultiplier (based on lengthBonus removal causing speed to be relatively underweighted)
  • Focused on balancing values to match a net 0 change on top 500 profiles (current +10)
  • Buffed FL a small percentage to accommodate change to length to better keep values reasonable (possibly +20 overbuffed but its FL)

2023-10-24 Update:

  • Removed speed's skill multiplier buff and instead buffed both skills by adjusting constants in the summation.

@minisbett
Copy link
Contributor

minisbett commented Oct 15, 2023

I've checked some pp values resulting from this, disregarding how the difficulty of maps changed (after all it has a direct influence and when invoking changes to the difficulty calculation the resulting performance algorithm should be evaluated as well), and left my opinion, looking for other opinions:
(I'm not all too much into the difficulty and pp side of osu anymore, and my look at things is from a surface pov, mostly looking at the outcome and what most of the osu community is rather interested in)
Looking at recalculated profiles, it really looks like the practical effects are being represented well. The question is rather whether this is an overall good change.

And honestly, I think this is a pretty good change at the surface. First I had some doubt because it looked like pp would just drop overall and get more dense, but digging a bit further into existent scores gave things a sense.
For instance, the currently 3 highest pp plays are reduced to 1564, 1375 and 1452, a change I really welcome because the top of the pp ladder started to really inflate because existing algorithms probably never had considered such scores before, they were simply out of scope. You can definitely see the goal of "make maps that have high object count but inconsistent difficulty worth less" (sidetracked day and valley of the damned), achieved here. As for azul, we probably see a nerf due to some inconsistent difficulty across the map. (The last square stream is insanely overweighted)

I further looked at the aim side of scores, for instance the profile of WhiteCat, which showed a slight increase for those dt aim maps that have been meta 3 years ago. To me personally this is a change I welcome, I feel like those maps have been a bit underrepresented and couldn't hold up well enough with the current meta.
However, I don't quite understand why those maps are seeing an increase, as they are partly very inconsistent in their difficulty with focus often being on a single spike. Is this intentional? Does the fact aim sees a buff since aim maps have a smaller object count than stream maps weigh more than the nerf invoked through the inconsistency?

On the other side, we can see scores like mrekk's save me being nerfed to 1175pp. I feel like this is a too heavy nerf. I see the reasoning behind it with streamy, high object count, inconsistent maps and the map length removal. But I still think this score should be worth more and it's length is, obviously, not valued enough in this version, which makes me feel like the length bonus removal should be reconsidered. I think removing it is not a good way to go, since I feel like it definitely does have it's purpose. I'd rather look at what could be done for it to integrate better into the algorithm.

On the speed meta side of things, for example Flaro and Sytho, you can see around the same values in nerfs as you can see in aim scores for buffs. I welcome this, not directly due to the applied changes, but the overall outcome is something I like to see. You can also see the effect of difficulty spike oriented, high object count maps getting a nerf here, but I was kind of expecting more than just those slight changes based on what I've seen so far.

Looking at my own profile, mentioning it because so far I've only talked about top players, I can see mostly buffs, I'm a ~45k dt aim player, seeing those changes kind of surprised me. The same questions I had about WhiteCats scores are coming up here, most of these short dt maps rather have one difficulty spike so I don't get why they're seeing a buff, I never felt like they need one.

Overall I think that the goal is kind of achieved, but there's lots of fine tuning and further consideration necessary since I can see scores where this change does not add up with what I think it should be like, it's kind of a hit or miss.
Let me know what y'all think.

@pull-request-size pull-request-size bot added size/M and removed size/S labels Oct 16, 2023
@peppy
Copy link
Member

peppy commented Oct 16, 2023

Judging this purely on the "Effects in practical terms", it sounds like a good direction.

@ppy ppy deleted a comment from github-actions bot Oct 19, 2023
@bdach
Copy link
Collaborator

bdach commented Oct 19, 2023

I've killed the diffcalc run triggered above as it caused two queued runs to deadlock eachother (see https://discord.com/channels/188630481301012481/188630652340404224/1164495326850322493 for paper trail). We'll rerun after that one finishing / after they're fixed.

@pull-request-size pull-request-size bot added size/S and removed size/M labels Oct 19, 2023
@ppy ppy deleted a comment from github-actions bot Oct 20, 2023
@ppy ppy deleted a comment from github-actions bot Oct 20, 2023
@smoogipoo
Copy link
Contributor

!diffcalc

@github-actions
Copy link

github-actions bot commented Oct 20, 2023

@Xexxar
Copy link
Contributor Author

Xexxar commented Oct 24, 2023

Based on what ppy said about the probability for pp updates being implemented this year being non zero, I wanted to state that I am prepared to help with any follow up pieces of the rework process such as preparing graphs or summaries of the changes, assuming this change is deemed acceptable with the current values.

Two questions:

  • Is there any further opinions on balancing or issues with this implementation that we'd like to talk through before progress further in this process?
  • Is there someone I could work with regarding that communication if we are prepared to implement this?

My current stance is that the only thing that definitely needs to be further examined is the SR adjustments which were made due to the change causing length to be included as a factor in SR. I sort of arbitrary picked some constant that adjusts by a factor that would keep mid length maps relatively similar in SR, but this might be something we should examine more carefully.

Possibly one other thing worth looking at is finding a way to better handle short aim maps. I'm thinking maybe there's benefit in going for a per object aim sum instead of per strain chunk? Might help with some of the issues on ringtone size maps, since that's a common criticism of this current build. Using the same style curve but with a per object base might do the trick.

Anyway, thanks for the feedback so far.

@Xexxar
Copy link
Contributor Author

Xexxar commented Feb 24, 2024

Closing this on account of the fact that I intend to PR a different change in this area at a later date.

@Xexxar Xexxar closed this Feb 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants