Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

high CPU usage on CharPredicate.Digit #518

Open
yanns opened this issue Dec 19, 2023 · 16 comments
Open

high CPU usage on CharPredicate.Digit #518

yanns opened this issue Dec 19, 2023 · 16 comments

Comments

@yanns
Copy link
Contributor

yanns commented Dec 19, 2023

When updating from JVM 19 to JVM 21, we notice very high CPU usage.

A profiling reveals a method using a simple rule:

def Digits = rule(oneOrMore(Digit))

Screenshot 2023-12-19 at 17 30 38

I don't know if anyone has already seen this behavior before, and/or has a clue where to look at.

@sirthias
Copy link
Owner

Is it the same bytecode that runs fine on 19 but "hangs" in 21?

@yanns
Copy link
Contributor Author

yanns commented Dec 19, 2023

Is it the same bytecode that runs fine on 19 but "hangs" in 21?

not exactly as we changed ThisBuild / scalacOptions ++= Seq("-release", "11") to ThisBuild / scalacOptions ++= Seq("-release", "21").
I can try to change this to see if it has an impact.

@yanns
Copy link
Contributor Author

yanns commented Dec 20, 2023

I could reproduce this issue with the same bytecode (release 11) deployed on a JVM 21.

Screenshot 2023-12-20 at 15 31 59

edit: it's very sporadic and happens only on a few instances sometimes.

@sirthias
Copy link
Owner

Hmm... if it's not deterministically exhibiting this behavior on one particular machine (setup) then I wonder how two different runs might differ from each other.
Are you somehow sharing the parser from different threads or are you 100% sure that it's only ever a single thread running a particular parser instance?
Could it be that the parser is running against partial input and the error might be triggered by hitting package boundaries at odd locations?

@yanns
Copy link
Contributor Author

yanns commented Dec 20, 2023

Are you somehow sharing the parser from different threads or are you 100% sure that it's only ever a single thread running a particular parser instance?

There's one new instance per input, and also per thread

Could it be that the parser is running against partial input and the error might be triggered by hitting package boundaries at odd locations?

The input is coming from user, so it can be that it's partial.
I've looked at the queries that were successfully parsed before the instance got stuck, and they seem to be all valid. But maybe one invalid is never logged out.
But it's a great point.

@sirthias
Copy link
Owner

Can you show what your WhiteSpace and Digits rules look like?
Maybe they are written in a way that also consumes the EOI virtual char after the input?
That virtual character cannot be consumed and is sometimes the cause of an infinite loop, e.g. when you try to consume all the chars that are not a digit or sth like that...

@yanns
Copy link
Contributor Author

yanns commented Dec 21, 2023

def Digits = rule(oneOrMore(Digit)) // from CharPredicate.Digit
def WhiteSpace = rule(quiet(zeroOrMore(WhiteSpaceChar)))
val WhiteSpaceChar = CharPredicate(" \n\r\t\f")

@sirthias
Copy link
Owner

Hmm... this really is curious.
I can't see how this could ever cause an infinite loop.
What input are you reading from? Byte Arrays or Strings?

In order to debug further it's be great to get a hold of a pathological input and try to reproduce the problem under the microscope...

@yanns
Copy link
Contributor Author

yanns commented Dec 21, 2023

We're parsing Strings.

I'm trying to get more data.
I can reproduce the issue only on production only on some instances.
I could find some input strings where the parsing takes long time.

But I cannot reproduce the issue locally, even with those inputs.
I'll keep up informed.
Thanks for the help!
If you have any other idea on where I could look at, please don't hesitate to share with me.

@yanns
Copy link
Contributor Author

yanns commented Apr 17, 2024

Some update: this issue also occurs with JVM 19 (maybe less frequently, but I cannot be sure here).
-> It's not related to the JVM version.

@martinw-ct
Copy link

martinw-ct commented Dec 16, 2024

Hello!

I carried on the analysis with this a bit after taking over the JVM21 upgrade from @yanns.

This rule is what is causing the 100% CPU hang that @yanns saw. However it does not happen often and it only happens if the JVM gets fairly hot. In a unit test scenario it works perfectly each time and never hangs.

def WhiteSpace: Rule[HNil, HNil] = rule(quiet(zeroOrMore(WhiteSpaceChar)))
val WhiteSpaceChar: CharPredicate = CharPredicate(" \n\r\t\f")

I was suspicious of the macros, so I replaced it with the code being generated:

...
      if (__inErrorAnalysis)
        wrapped
      else {
        ();
        @tailrec() def rec(mark: org.parboiled2.Parser.Mark): org.parboiled2.Parser.Mark = {
          val matched = TestCommonRules.this.WhiteSpaceChar(cursorChar).$amp$amp(__advance());
          if (matched) {
            ();
            rec(__saveState)
          } else
            mark
        };
        __restoreState(rec(__saveState));
        true
      };
    if (matched)
      org.parboiled2.Rule
    else
      null
  }

Then I saw it hang at the recursive call to rec(__saveState), forever recursively calling itself.

However this is impossible because every call does __advance and after a certain point the cursorChar should become EOI/\uFFFF and MaskBased.apply with the given 5 chars won't return true.

As unlikely as it seems, I can't help but conclude this is genuinely a JVM bug itself in the PGO. Not when making the standard first-pass bytecode but the highly optimized x86 code. It must be making some flawed assumption with the bitwise operators and spitting out an infinite loop. 🤷

This is also a very delicate error case. I tried some basic tweaks such as adding printlns or a loop counter with break and doing anything like that immediately "fixed" it.

Trying to move this out of our fairly large Scala application into a minimal reproducer also failed.

For now CharPredicate.from(c => " \n\r\t\f".contains(c)) is a workaround, even if it does perform slightly worse. Hiding the condition in a lambda and using the String.contains instead of the bitwise mask the default CharPredicate uses is enough to avoid this pothole in the optimiser.

@sirthias
Copy link
Owner

That is really curios.
Even though it's possible that it's a JVM bug I really don't believe it (yet).
You could try to manually inline also the TestCommonRules.this.WhiteSpaceChar, __advance and __saveState calls and see whether the issue is still present. Maybe, looking at the resulting code we can somehow see better what the reason might be.

But I agree that it's definitely strange that the same code running on the same input behaves differently depending on the JVM being hot or cold. Aside from a bug in the JVM (as you suspect) the reason could be the different runtime performance triggering synchronization, caching or other issues that are indeterministic.
There is definitely only ever a single thread running in your experiments, right?

@martinw-ct
Copy link

But I agree that it's definitely strange that the same code running on the same input behaves differently depending on the JVM being hot or cold. Aside from a bug in the JVM (as you suspect) the reason could be the different runtime performance triggering synchronization, caching or other issues that are indeterministic. There is definitely only ever a single thread running in your experiments, right?

Unfortunately no, there are many threads. The only way I've found to reproduce this is by hammering our entire application with requests from a full integration test suite.

However, at least according to the logs, there was only a single thing being parsed at the time of the hang. And from the code we very definitely create a new Parser instance for every thing to parse. I agree concurrency with two threads accessing the same parser internal state at once is more likely but I don't think that is the case here. 🤔

I did assemble some more supporting evidence though with JFR. After 4 minutes of normal functioning the level 4 compilation kicks in and immediately afterwards it freezes at 100% CPU inside the rule.

image

@sirthias
Copy link
Owner

Hmm... I see.
Really strange and, having looked through the source of CharPredicate.MaskBased.apply and __advance() again, I don't see anything that could explain indeterministic behavior.

Thank you for reporting this!
And I'm glad that the work around works fine and shouldn't really cause any issues with regard to performance, I assume.
The parser is still fast enough for your use case, or isn't it?

One way to test for a genuine JVM bug could be to simply run different JVM implementations, i.e. Graal, Temurin, etc. against each other. Many probably share the same code somewhere deep down and thus might exhibit identical behavior but I could imagine that Graal for example is different enough to provide a good test subject.

@martinw-ct
Copy link

martinw-ct commented Dec 17, 2024

One way to test for a genuine JVM bug could be to simply run different JVM implementations, i.e. Graal, Temurin, etc. against each other. Many probably share the same code somewhere deep down and thus might exhibit identical behavior but I could imagine that Graal for example is different enough to provide a good test subject.

For what it's worth I did try OpenJDK 23 as well, which also fixed it.

The parser is still fast enough for your use case, or isn't it?

Yes definitely still fast enough. :D

@sirthias
Copy link
Owner

Ah, ok.
So the problem is just OpenJDK 21?
Have you tried anything else, like Coretto, Graal, Temurin or Zulu?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants