Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency failure in jsoup? #3321

Open
owengray-google opened this issue Nov 8, 2023 · 1 comment
Open

Concurrency failure in jsoup? #3321

owengray-google opened this issue Nov 8, 2023 · 1 comment
Labels

Comments

@owengray-google
Copy link
Contributor

owengray-google commented Nov 8, 2023

Describe the bug
We have recently been seeing some timeouts when running dackka. We managed to capture JVM threads during a timeout kill:

I think the dackka process was probably

4556 org.jetbrains.dokka.MainKt /buildbot/dist_dirs/aosp-androidx-main-linux-androidx/11066813/dackkaArgs-docs-public.json -loggingLevel WARN -Dfile.encoding=UTF-8 -Duser.country=US -Duser.language=en -Duser.variant
The full stacktrace for that process is pretty long but I notice that most threads have a stacktrace that looks like this:

"DefaultDispatcher-worker-1" #28 daemon prio=5 os_prio=0 cpu=243.46ms elapsed=3478.27s tid=0x00007f99a09cc450 nid=0x122d in Object.wait()  [0x00007f986edeb000]
   java.lang.Thread.State: RUNNABLE
	at org.jetbrains.dokka.base.translators.ParseWithNormalisedSpacesKt$parseHtmlEncodedWithNormalisedSpaces$1.invoke(parseWithNormalisedSpaces.kt:25)
	- waiting on the Class initialization monitor for org.jsoup.nodes.Entities
	at org.jetbrains.dokka.base.translators.ParseWithNormalisedSpacesKt$parseHtmlEncodedWithNormalisedSpaces$1.invoke(parseWithNormalisedSpaces.kt)
	at org.intellij.markdown.lexer.Compat.forEachCodePoint(Compat.kt:14)
	at org.jetbrains.dokka.base.translators.ParseWithNormalisedSpacesKt.parseHtmlEncodedWithNormalisedSpaces(parseWithNormalisedSpaces.kt:19)
	at org.jetbrains.dokka.base.translators.ParseWithNormalisedSpacesKt.parseWithNormalisedSpaces(parseWithNormalisedSpaces.kt:49)
	at org.jetbrains.dokka.base.parsers.factories.DocTagsFromIElementFactory.getInstance(DocTagsFromIElementFactory.kt:46)
	at org.jetbrains.dokka.base.parsers.factories.DocTagsFromIElementFactory.getInstance$default(DocTagsFromIElementFactory.kt:16)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.textHandler(MarkdownParser.kt:243)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.visitNode(MarkdownParser.kt:391)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.evaluateChildren(MarkdownParser.kt:408)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.evaluateChildren$default(MarkdownParser.kt:407)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.defaultHandler(MarkdownParser.kt:355)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.visitNode(MarkdownParser.kt:398)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.evaluateChildren(MarkdownParser.kt:408)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.evaluateChildren$default(MarkdownParser.kt:407)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.markdownFileHandler(MarkdownParser.kt:201)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.visitNode(MarkdownParser.kt:392)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.visitNode$default(MarkdownParser.kt:358)
	at org.jetbrains.dokka.base.parsers.MarkdownParser.parseStringToDocNode(MarkdownParser.kt:40)
	at org.jetbrains.dokka.base.parsers.MarkdownParser$Companion$parseFromKDocTag$1.invoke(MarkdownParser.kt:514)
	at org.jetbrains.dokka.base.parsers.MarkdownParser$Companion.parseFromKDocTag(MarkdownParser.kt:526)
	at org.jetbrains.dokka.base.parsers.MarkdownParser$Companion.parseFromKDocTag$default(MarkdownParser.kt:508)
	at org.jetbrains.dokka.base.translators.descriptors.DokkaDescriptorVisitor.getDocumentation(DefaultDescriptorToDocumentableTranslator.kt:1029)
	at org.jetbrains.dokka.base.translators.descriptors.DokkaDescriptorVisitor.resolveDescriptorData(DefaultDescriptorToDocumentableTranslator.kt:904)
	at org.jetbrains.dokka.base.translators.descriptors.DokkaDescriptorVisitor.access$resolveDescriptorData(DefaultDescriptorToDocumentableTranslator.kt:135)
	at org.jetbrains.dokka.base.translators.descriptors.DokkaDescriptorVisitor$resolveClassDescriptionData$2.invokeSuspend(DefaultDescriptorToDocumentableTranslator.kt:939)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.internal.ScopeCoroutine.afterResume(Scopes.kt:33)
	at kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:102)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)
	at kotlinx.coroutines.internal.ScopeCoroutine.afterResume(Scopes.kt:33)
	at kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:102)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:749)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:677)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)

and I notice one thread whose stack looks like this:

"DefaultDispatcher-worker-15" #42 daemon prio=5 os_prio=0 cpu=697.27ms elapsed=3478.23s tid=0x00007f9824023400 nid=0x123b in Object.wait()  [0x00007f986d6f0000]
   java.lang.Thread.State: RUNNABLE
	at org.jsoup.nodes.Document$OutputSettings.<init>(Document.java:416)
	- waiting on the Class initialization monitor for org.jsoup.nodes.Entities$EscapeMode
	at org.jsoup.nodes.Document.<init>(Document.java:26)
	at org.jsoup.nodes.Document.createShell(Document.java:52)
	at org.jsoup.parser.Parser.parseBodyFragment(Parser.java:218)
	at org.jsoup.Jsoup.parseBodyFragment(Jsoup.java:241)
	at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser$Parse.invoke(JavadocParser.kt:465)
	at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser.convertJavadocElements(JavadocParser.kt:474)
	at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser.convertJavadocElements$default(JavadocParser.kt:471)
	at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser.getDescription(JavadocParser.kt:207)
	at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser.parseDocComment$base(JavadocParser.kt:59)
	at org.jetbrains.dokka.base.translators.psi.parsers.JavadocParser.parseDocumentation(JavadocParser.kt:52)
	at org.jetbrains.dokka.base.translators.psi.DefaultPsiToDocumentableTranslator$DokkaPsiParser$parseClasslike$2.invokeSuspend(DefaultPsiToDocumentableTranslator.kt:243)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:749)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:677)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)

I believe the underlying issue we're running into is in org.jsoup.nodes.Entities, where:
private static final HashMap<String, String> multipoints = new HashMap<>(); // name -> multiple character references
is static but is not threadsafe, and that this issue would be fixed by using ConcurrentHashMap there.

I have filed this on jsoup as jhy/jsoup#2042. If that is indeed the cause, then this bug should just be to update the jsoup version once it is fixed upstream.

@vmishenev
Copy link
Contributor

It if is indeed a concurrency issue, #3151 can fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants