-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Value Class Option [Arrow 2] #2950
Conversation
This seems quite promising. However, I'm a bit concerned about the requirement of making almost every function |
I have to try this branch against arrow-integrations-jackson.. I have a bad feeling that Jackson might not play well with generic value classes |
In addition to the concerns of @serras and @myuwono I am wondering if this is worth it. The only good use-case IMO for I took a look at the benchmarks, and the ones that seem to give the most performance benefits are the least used pattern. There is also the ticket to support inline sealed class, which could already bring a lot of these enhancements without requiring this encoding. @kyay10 do you know if there is any mention of a timeline on those? And how would that potentially impact these changes? |
@nomisRev with Here's an interesting idea to address both your and @serras' concerns. The sister PR #2951 doesn't require the usage of The only downside of this idea would be the inability to use One final note, if/when KT-51417 Restricted Types get introduced into the language, Option could then be implemented using them. It might be best to wait until the |
Although I applaud the great work you've put into this (honestly, having a PR with a performance review is just awesome!), I think the disadvantages are more than the potential advantages here. For me, the main tipping point is that |
Agreed. Should I close this PR then? And what about #2951 then? Should I close that one as well, or could it have some (minimal) merit? |
Hey @kyay10, Thank you for referencing the resources to
Like you already said yourself, I think it's best to wait. So I am sad to say I think we should also close #2951.
The use-case for this is so rare IMO, that I am personally happy to redefine it on a per-use-case basis. Actually so much I've thought about having it defined a couple times in Arrow, so that I can make it completely private... I feel bad closing all this work, after you've done such an amazing job with all the benchmarks and everything @kyay10 😞 I hope you learned a lot, and had a good time working on this. |
See #2780 for prior discussion on the implementation details and drawbacks of this proposal. This time around, I've done some benchmarking to see what the performance improvement really is like.
The benchmarking code is available at branches option-benchmark for the current arrow impl and value-option-benchmark. The option-benchmark repo has the detailed benchmark results for each iteration in XML format which can be imported into Intellij.
The benchmarks focused on what I considered as regular usage of
Option
.nestedOptions
creates anOption<Option<Int>>
which can arise when mixing multiple Option-returning functions.nestedOptionsBoxed
does the same thing but it stores the Options in a list because that forces thevalue class
impl to box those Options, which could impact performance. The rest of the benchmarks would take the index of the current iteration, convert it to a string, then try to parse it back to an integer, as a representation of some workload. There'snoneComprehension
andsomeComprehension
which uses anoption
comprehension combined with suspending functions. TheXComprehensionBlocking
variations are as before except that it uses regular functions instead of suspending ones. ThenoneTryCatchComprehensionBlocking
andsomeSingleComprehensionBlocking
both use a single comprehension instead of using comprehensions in a loop. Finally,XWithoutComprehension
does the same operations but without relying on comprehensions at all.Summary of the results:
Interesting observations here include that nestedOptions saw anywhere from a 10-20% increase in throughput, with the boxed version seeing more modest increases of about 5-10% (which is marginal). The basic comprehension saw a (marginal) decrease in throughput of about 5%, which I suspect is because in coroutines, when a value object is created before a suspending call and is used after a suspending all, it obviously needs to survive inside the
Continuation
, but for some reason the compiler decides to box any such surviving values. This, combined with the overhead of boxing and unboxing, likely caused that decrease in throughput. This is supported by the Blocking variations, where throughput was comprabable fornoneComprehensionBlocking
and saw an increase of 3-4% forsomeComprehensionBlocking
. The throughput difference is lower likely because the aforementioned comprehension functions all create a comprehension in each iteration of the loop, so any benefit from using avalue class Option
is marginal compared to the cost of creating aRaise
. Also, the benefit ofvalue class Option
is further marginalised by the cost of throwing exceptions innoneComprehensionBlocking
, hence why throughput was comparable.Because of the cost of creating comprehensions, I then tried using a single comprehension (and using try-catch in the none case to recover from errors). I also ran a benchmark with no comprehensions whatsoever. The results were in
value class Option
's favour, but always by a very small percentage of extra throughput.In conclusion, benchmarking
Option
s alone seemed to show a significant performance benefit from using avalue class
impl, but it seems that those benefits become marginal when theOption
code is part of some workload. In fact, there is extra overhead when combined with suspending functions due to a missing optimization for value classes surviving through suspending calls. Don't forget either that turning Option into avalue class
and optimising its nesting within itself requires inlining and using reified types practically everywhere, which could be an ergonomic disadvantage.I can change the code to make it so that it is less optimised for nested Options but doesn't use inline and reified everywhere, and that could decrease the invasiveness of this PR(Edit: see #2951 for a less-intrusive implementation ofvalue class Option
, but with less performance benefits). Ultimately, the decision is in your hands, and I'm more than happy to discuss any of the details of this and make changes until the PR makes sense for the library.