Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jakarta JSON-P Json.*** methods are crazily inefficient #42748

Closed
gsmet opened this issue Aug 25, 2024 · 28 comments
Closed

Jakarta JSON-P Json.*** methods are crazily inefficient #42748

gsmet opened this issue Aug 25, 2024 · 28 comments

Comments

@gsmet
Copy link
Member

gsmet commented Aug 25, 2024

The low-level static method hosted in the Jakarta JSON-P Json object are very inefficient: they are resolving a JsonProvider.provider() for each call which in turn calls the ServiceLoader (among other things...) and allocate a new JsonProvider.

See:

Meaning that when you are calling Json.createValue("string") to allocate a simple String value, you are actually making a call to the service loader and instantiating a new JsonProvider.

I stumbled upon that independently this week-end and after experimenting with it and writing a quick patch, I was like "it's crazy nobody complained about it" so I went to the JSON-P website and stumbled upon:

Meaning that I don't expect it to be fixed any time soon.

To give an idea of how things go, I did a simple JMH benchmark.

Json.createValue("test") i.e. what people (including us) use:

Iteration   1: 153.907 ops/ms
Iteration   2: 155.308 ops/ms
Iteration   3: 154.542 ops/ms
Iteration   4: 155.963 ops/ms
Iteration   5: 156.462 ops/ms

Using a static provider with JSON_PROVIDER.createValue("test"):

Iteration   1: 1112305.967 ops/ms
Iteration   2: 1114183.052 ops/ms
Iteration   3: 1113783.793 ops/ms
Iteration   4: 1111395.048 ops/ms
Iteration   5: 1112449.325 ops/ms

That's 7200+ times faster.
The createValue() methods are the ones for which it's the most inefficient as they should be extremely lightweight as called quite often.

While we are mostly using Jackson in Quarkus itself and recommending Jackson, we have several components using JSON-P and JSON-B in the SmallRye world, for instance SmallRye GraphQL and SmallRye Health.
And we have people using JSON-P/JSON-B out there because they want to stay in the MicroProfile or Jakarta world.

I know SmallRye GraphQL is affected by the inefficiencies as I have seen some direct calls to the Json methods in some runtime code (for instance here: https://github.com/smallrye/smallrye-graphql/blob/main/client/implementation/src/main/java/io/smallrye/graphql/client/impl/RequestImpl.java#L75).
From what I can see SmallRye Health is not affected as it creates a unique JsonProvider.

Note that I haven't done any benchmarks in SmallRye GraphQL, I have just verified that we were instantiating too many JsonProviders.

As for the SmallRye components, I think we should discuss this subject in the context of Quarkus as the solution we want to put in place for Quarkus might be different from the ones we want to use in the general case (for instance in WildFly).

  • we might want to have an SPI to push Jackson there. I discussed it a few times with @jmartisk already but it wasn't considered top priority
  • even if we do, we probably need to fix the JSON-P/JSON-B case - while fixing our JSON-P calls could be easy (at least in the Quarkus case, I'm not sure if in the case of WildFly, we would want a single provider), I don't know if JSON-B uses the Json shortcut under the hood, in which case it might make things a bit harder to handle.
@gsmet gsmet added area/smallrye area/jackson Issues related to Jackson (JSON library) and removed area/jackson Issues related to Jackson (JSON library) triage/needs-triage labels Aug 25, 2024
Copy link

quarkus-bot bot commented Aug 25, 2024

/cc @EricWittmann (openapi), @Ladicek (fault-tolerance,smallrye), @MikeEdgar (openapi), @brunobat (tracing), @cescoffier (mutiny,reactive-messaging), @ebullient (metrics), @jmartisk (graphql,health,metrics,smallrye), @jponge (mutiny), @ozangunalp (reactive-messaging), @phillip-kruger (graphql,openapi,smallrye), @radcortez (smallrye,tracing), @sberyozkin (jwt), @xstefank (health)

@gsmet
Copy link
Member Author

gsmet commented Aug 25, 2024

Hmmm, the bot shouldn't have mentioned so many people, sorry about that. We probably need to tweak this area/smallrye label.

@gsmet
Copy link
Member Author

gsmet commented Aug 25, 2024

A quick workaround would be to initialize one static JsonProvider somewhere (or at least to have only one for the client and one for the server if we want to support multiple deployments and tweaking the JsonProvider but that means having the means to propagate it wherever it's needed) in the code and ban the use of Json..

Now, no idea if JSON-B uses Json so that might just be a band aid.

@jmartisk
Copy link
Contributor

Regarding the WildFly/EAP case and potential multiple providers: I generally gave up on supporting multiple deployments of SmallRye GraphQL in a single WF instance. It generally works, but there are lots of potential problems where the apps are not fully isolated. In the past, I made some efforts to make it work, but it was unwieldy, so I kinda gave up. This is in line with the official support policies of the EAP XP product (see https://access.redhat.com/articles/2026253) where multiple deployments are explicitly unsupported. So I guess we could start with the quick fix of caching the JSON provider and then look into bringing in support for Jackson...

@gsmet
Copy link
Member Author

gsmet commented Aug 26, 2024

@jmartisk if we are following this path, I wonder if we should make it static in Health too: https://github.com/smallrye/smallrye-health/blob/main/implementation/src/main/java/io/smallrye/health/SmallRyeHealthReporter.java#L125 .

AFAICS SmallRyeHealthReporter is instantiated twice in Quarkus due to our CDI infra so we are initializing two JsonProvider.

Probably worth discussing it with @xstefank and @radcortez .

@jmartisk
Copy link
Contributor

Possibly, but given that it's an app-scoped CDI bean, it doesn't seem necessary (the problem would be obtaining the provider on each request), and having static fields in CDI beans is... weird

@gsmet
Copy link
Member Author

gsmet commented Aug 26, 2024

So I kinda agree but it's still one ServiceLoader call we could avoid. I will discuss it with Martin Kouba hopefully soon.

@xstefank
Copy link
Member

I was thinking about providing Jackson API in SR Health, but we need to be aligned with JSON-P for spec. But I personally don't see what's so wrong with a static field for this since it's really just utility for marshaling. @mkouba mentioning you so you chime in.

@mkouba
Copy link
Contributor

mkouba commented Aug 26, 2024

AFAICS SmallRyeHealthReporter is instantiated twice in Quarkus due to our CDI infra so we are initializing two JsonProvider.

It's instantiated twice because it's an @ApplicationScoped bean so one instance is the client proxy which is a subclass of SmallRyeHealthReporter. And the same applies to all those collections and maps, such as SmallRyeHealthReporter#livenessUnis. We could initialize the state in the @PostConstruct callback. For JsonProvider I think it should be ok to use a static final field instead. However, we need to make sure the constant is initialized at the right time, i.e. when the correct provider is available.

@radcortez
Copy link
Member

Yes, the problem is that in the SR world, we have to support JSON-B / JSON-P because the MP platform requires it.

We could provide our own version of the JSON-B jar, with a different implementation. JBoss did that before, and Apache still does. It shouldn't require changes to the method signatures, so it should pass those. I'm not sure if there is a TCK test that checks if you get a different version of the provider on each call, but I find that unlikely.

@gsmet
Copy link
Member Author

gsmet commented Aug 26, 2024

So I'm not sure JSON-B itself is affected by this issue. I would expect them to have avoided it given it's well known on their side.
We need to be extremely cautious about using Json in our code.

Also, while we know you need to support JSON-P/JSON-B, what we would like to see is an SPI to be able to push Jackson there and avoid having two JSON serialization implementation around in most cases.

Now, what I don't know is if it's feasible or if you have JSON-P/-B publicly exposed. We could probably be OK with JSON-P but it would really be nice if we could avoid having JSON-B in the classpath when using Jackson.

Now is it a priority? Is it too complex. I don't know :).

@radcortez
Copy link
Member

Also, while we know you need to support JSON-P/JSON-B, what we would like to see is an SPI to be able to push Jackson there and avoid having two JSON serialization implementation around in most cases.

Agree. Unfortunately, that also means that we would need to create yet another JSON Provider entry point so you could delegate to JSON-P or Jackson. It's not difficult, but we have to remember to use that one instead, and most likely, users would still call the original APIs.

The best option, at least for Quarkus, is to provide our own API jar with performance improvements since users need to reference the extension that they want to use directly, and we can easily control which API dependency to add. In traditional Jakarta Apps servers, it's trickier because users add the Jakarta API dependency from the official project.

@gsmet
Copy link
Member Author

gsmet commented Aug 26, 2024

Unfortunately, that also means that we would need to create yet another JSON Provider entry point so you could delegate to JSON-P or Jackson.

But are the users of SmallRye GraphQL/Health concerned by the JSON provider? That's why I was asking if it was leaking in the APIs?

@radcortez
Copy link
Member

I don't believe we have leaks in SR itself, but the Jakarta JSON APIs may be included transitively by the MP APIs. We need to check each one individually.

@gsmet
Copy link
Member Author

gsmet commented Aug 26, 2024

I wouldn't be too worried about having the jars around.
There's a difference between having a jar around and initializing the library. We could improve on the latter and see how we can handle the former a bit later.

Anyway, for now, I would focus on making sure we don't instantiate too many providers as it's an easy win.
The rest should be prioritized at an higher level probably.

@radcortez
Copy link
Member

I wouldn't be too worried about having the jars around.

I was thinking about the user calling that API with terrible performance.

@gsmet
Copy link
Member Author

gsmet commented Aug 26, 2024

Ah yes, that's their responsibility, really. I posted a message in the original issue there explaining how bad it really was with numbers. Because there's bad and there's really bad :).

@geoand
Copy link
Contributor

geoand commented Aug 27, 2024

we might want to have an SPI to push Jackson there. I discussed it a few times with @jmartisk already but it wasn't considered top priority

I am personally in favor of this being done sooner rather than later and willing to help make it happen.

@jmartisk
Copy link
Contributor

I am personally in favor of this being done sooner rather than later and willing to help make it happen.

That's the spirit! You're certainly welcome to do it :)

@t1
Copy link

t1 commented Aug 30, 2024

Maybe this is one of the reasons why YASSON is so slow. It would be wrong to blame JSON-B for the bad performance of YASSON. JSON-B itself doesn't rely on the static methods in jakarta.json.Json; you can pass an instance of a jakarta.json.spi.JsonProvider into your jakarta.json.bind.JsonbBuilder; otherwise it looks it up... once!

I'd like to hint at the QSON project by @patriot1burke and this discussion for this quarkus.io blog post by @mariofusco. They make Jackson even faster by moving the reflection part into the build process... which is really the Quarkus mindset. A big applause for that! QSON seems to not be very active ATM, but it could just require a little push (I love the name!)

I've started to experiment with a similar idea, but as an annotation processor and based on JSON-B APIs. (I haven't published it, yet, because it depends on a SNAPSHOT version of an annotation processing lib I develop in parallel, so it doesn't build OOTB.)

So, we actually always have three parts:

  1. API (i.e. Jackson vs. JSON-B; both seem to be quite feature complete, don't they?)
  2. code scanning (i.e. reflection, annotation processor, or Quarkus build-time meta programming)
  3. (de)serializing

If I'm not mistaken, I think that way back when JSON-B was still young, there even was an initiative to make Jackson support the JSON-B APIs. But I haven't heard anything about that ever since. I have no idea about how much effort that would be; the Jackson code I've yet seen looks.... hmmm... highly optimized.

I would really love it, if application developers could stop worrying about such things. They should just pick an API and always get the best performance.

@geoand
Copy link
Contributor

geoand commented Aug 30, 2024

@t1 thanks for the input!

I would like to clarify a couple more reasons why myself (and I assume @gsmet) as much in favor of using Jackson everywhere possible:

  • It's by the far the most used and understood library of its class in the JVM ecosystem
  • The library is battle-harden perhaps more than any other and the project's maintainers are very serious about the project - something that super important when it comes to CVE handling
  • By using Jackson everywhere we can, we can cut down on the number of classes that are on the runtime classpath with all the associated benefits.

@t1
Copy link

t1 commented Aug 30, 2024

My previous comment may have sounded too negative; actually I have nothing against Jackson! If I understand the article correctly, Jackson already allows to nicely separate the scanning from the actual (de)serializing. It would be nice, if it would also separate its API from the implementation. Then we could relatively simply add support for the JSON-B APIs for everything MP... or as an alternative, if a Quarkus developer chooses so. Under the hood, it could always be Jackson.

@gsmet
Copy link
Member Author

gsmet commented Aug 30, 2024

I think we can close this one now.

@gsmet gsmet closed this as completed Aug 30, 2024
@geoand
Copy link
Contributor

geoand commented Nov 6, 2024

@radcortez since this issue mainly affected Smallrye libs, I wonder if it makes sense to have a forbiddenapis configuration banning the use of jakarta.json.Json

geoand added a commit to geoand/quarkus that referenced this issue Nov 6, 2024
@geoand
Copy link
Contributor

geoand commented Nov 6, 2024

I did #44339 for OIDC here

@radcortez
Copy link
Member

@radcortez since this issue mainly affected Smallrye libs, I wonder if it makes sense to have a forbiddenapis configuration banning the use of jakarta.json.Json

I had a look, and the SR Components relying on jakarta.json.Json and it seems only smallrye-jwt is using it here:
https://github.com/smallrye/smallrye-jwt/blob/531655a24ef3fee878a1f1d6ce2d05fd800408b9/implementation/jwt-auth/src/main/java/io/smallrye/jwt/JsonUtils.java

I'm fine to ban it, but we cannot rely force us to move to jackson, because we must primarily support json-b / p :(

@geoand
Copy link
Contributor

geoand commented Nov 6, 2024

Yeah, I've given up on moving to Jackson, at this point I just want to ensure we don't have to deal with this crap again

gsmet added a commit to gsmet/smallrye-jwt that referenced this issue Nov 6, 2024
@gsmet
Copy link
Member Author

gsmet commented Nov 6, 2024

I created smallrye/smallrye-jwt#836 for SmallRye JWT.

sberyozkin pushed a commit to smallrye/smallrye-jwt that referenced this issue Nov 9, 2024
gsmet pushed a commit to gsmet/quarkus that referenced this issue Nov 12, 2024
Relates to: quarkusio#42748

(cherry picked from commit b7e58d6)
bschuhmann pushed a commit to bschuhmann/quarkus that referenced this issue Nov 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants