-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SmallRyeThreadContext's close can use custom cleanup strategies #424
Comments
Another (additional) option could be to let users provide their own Thread local like impl (with some public API) and quarkus can even decide to mix strategies:
|
I already have support for custom thread locals, but nobody's ever dared to use them. So, why don't we always call |
Hi @FroMage this spring-cloud/spring-cloud-sleuth#27 (comment) explain in deep details what happen if in short:
In quarkus we don't care about it (please correct me if i'm wrong), because our class loader will be around for the whole application's life, meaning that we have no business into disconnect it's life duration from the one of the The same reasoning can still hold to non I/O threads too, to be honest (so my previous statement seems inaccurate). The sole difference of I/O threads vs blocking ones is that the former will last for the whole application duration as well (as the classloader and likely the context propagation usage too), hence the only impact is the presence of such What's dangerous, as mentioned in the issue I've referenced, is when classloaders life-time isn't the same of the application they belong, which can happen on wildfly, for example. |
OK, so it could be an option. But the problem is that most Frankly I think this all should go via the new "Storage" API, but I don't have the time to drive this :( |
Why not using a sys property to decide which of the two behaviours? |
Actually, this is not the case in dev-mode - in dev-mode user code is loaded via a ClassLoader that is dropped when the application is reloaded. |
That means that while compiled in dev/tests "mode" we have to disable such "feature", using
Beware: it just means that the Please let me know if you see other options in order to achieve it |
Yes |
One note @FroMage it seems that Netty's @geoand too -> meaning we can just use the "custom thread local" feature of this repo in quarkus and use fast thread locals from there, on both dev/test/prod scenarios (they should just work) |
This confuses me, don't we already use FastThreadLocal as a result of using Vert.x? |
@geoand not for this specific one I can tell. I've performed a micro benchmark to understand what are the others implications of NOT using fast thread locals for remove by using the benchmark at https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/concurrent/ThreadLocalScalability.java
what the numbers shows is that NOTE Going to check what happen with fast thread locals in a few. |
That's a pretty big performance impact! In Quarkus prod-mode, AFAIR, we use |
Ok @Sanne and @geoand so the proper fixes are:
The 2 things could happen separately but ideally we want both |
That sounds reasonable to me. |
Awesome! |
Yeah, I had the same thought, but I can't find where I wrote it :) |
@franz1981 The original idea behind Then, there's another thing designed to make thread locals super fast, which is that instead of handing out So the idea is that, for example, you have a lib like
class TransactionThreadLocal extends ThreadLocal<Transaction> {
public void set(Transaction tx) {
Thread t = Thread.currentThread();
if(t instanceof QuarkusVertxThread) {
((QuarkusVertxThread)t).transaction = tx;
} else {
super.set(tx);
}
}
// … etc
}
All this was already implemented years ago in branches, but nobody had time to test and benchmark it. I still don't have time, but I'd be happy to help you along :) See:
Let me know if you're interested. |
Thanks @FroMage I will go through your comment to check if can solve this (avoid remove and using the faster set(null) for us: I wasn't fully convinced by the noise in the micro-benchmark and I have involved the Jdk team to review what I believe to have noticed related remove. |
I will likely experiment with a patch as you have suggested in the next weeks and it is indeed the best solution, perf-wise, although requires few code changes. At the same time, Netty fast thread local is very similar but less invasive: differently from Java ThreadLocals, they would perform a simple array lookup in the FastThread instance. I will try schedule a call next week so we can decide together, given pro/cons of each solution. |
I've spotted this In few reactive benchmarks using quarkus which doesn't disable context propagation (fairly commons for users that don't change default I believe): from what i can see there (and the code confirm it) the behaviour of
SmallRyeThreadContext
on close is to call ThreadLocal's removal, in order to save leaks to happen.Sadly
ThreadLocal::remove
have a huge impact perf wise due tohttps://github.com/openjdk/jdk19/blob/967a28c3d85fdde6d5eb48aa0edd8f7597772469/src/java.base/share/classes/java/lang/ThreadLocal.java#L570
which end up calling
Reference::clear
viahttps://github.com/openjdk/jdk19/blob/967a28c3d85fdde6d5eb48aa0edd8f7597772469/src/java.base/share/classes/java/lang/ThreadLocal.java#L368
that can be seen in this flamegraph (in violet)
and at a deeper level
which show the interaction with the runtime to coordinate an async cleanup (it's a weak ref :"/ sob)
Given that such cost happen in the I/O thread it can be pretty severe and should be better saving it.
I order to improve this, would be great to be able to configure
SmallRyeThreadContext
via a lambda (once or more time) which can allow an external framework to perform some check to guide the close operation to perform a much cheaperset(null)
(which doesn't remove the entry) or full fatremove
as it is now: in the quarkus use case I think that if such lambda can query if the context of execution is within an I/O event loop thread (not just on a vertx ctx!), there won't be any need to remove the thread local entirely, because I/O threads will be alive for the whole application life-time.The text was updated successfully, but these errors were encountered: