Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unhelpful 'vertx thread blocked' errors in log if openai key is not set #927

Open
holly-cummins opened this issue Sep 25, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@holly-cummins
Copy link
Contributor

If I run a simple application such as https://github.com/kdubois/quarkus-langchain4j-podman-ai-instructlab, but with the default openai base url and no export QUARKUS_LANGCHAIN4J_OPENAI_API_KEY=..., I get the following errors in the logs:

2024-09-25 18:54:00,888 WARN  [io.ver.cor.imp.BlockedThreadChecker] (vertx-blocked-thread-checker) Thread Thread[vert.x-worker-thread-4,5,main] has been blocked for 99893 ms, time limit is 60000 ms: io.vertx.core.VertxException: Thread blocked
        at java.base/jdk.internal.misc.Unsafe.park(Native Method)
        at java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:221)
        at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
        at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:990)
        at java.base/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
        at java.base/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
        at io.quarkus.deployment.dev.RuntimeUpdatesProcessor.doScan(RuntimeUpdatesProcessor.java:462)
        at io.quarkus.deployment.dev.RuntimeUpdatesProcessor.doScan(RuntimeUpdatesProcessor.java:455)
        at io.quarkus.vertx.http.runtime.devmode.VertxHttpHotReplacementSetup$6.call(VertxHttpHotReplacementSetup.java:161)
        at io.quarkus.vertx.http.runtime.devmode.VertxHttpHotReplacementSetup$6.call(VertxHttpHotReplacementSetup.java:148)
        at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$0(ContextImpl.java:178)
        at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:279)
        at io.vertx.core.impl.ContextImpl.lambda$internalExecuteBlocking$2(ContextImpl.java:210)
        at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18)
        at org.jboss.threads.EnhancedQueueExecutor$Task.doRunWith(EnhancedQueueExecutor.java:2516)
        at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2495)
        at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1495)
        at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:11)
        at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:11)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:1583)

It's not obvious that the root cause is missing configuration. I see this with 0.17.2 and 0.18.

@geoand
Copy link
Collaborator

geoand commented Sep 26, 2024

@holly-cummins how are you reproducing this?

Asking because I am seeing:

Configuration validation failed:
        SRCFG00014: The config property quarkus.langchain4j.openai.api-key is required but it could not be found in any config source

@holly-cummins
Copy link
Contributor Author

holly-cummins commented Sep 27, 2024

@holly-cummins how are you reproducing this?

Asking because I am seeing:

Configuration validation failed:
        SRCFG00014: The config property quarkus.langchain4j.openai.api-key is required but it could not be found in any config source

Right, that's what I expected to happen, too. :)

If I try and do a naive reproducer, I get the right message in the logs, and a relevant stack trace in the browser. Which I guess is reassuring.

I've just looked more carefully at my reproducer, and it seems to be adding the micrometer problem that exposes the issue. If I remove micrometer all is well. Even with micrometer, I do see the expected config message in the log, but it's overwhelmed by the vertx stack trace lower down. I do have to wait a few minutes for the vertx trace, but I get this immediately on startup:

2024-09-27 15:33:18,490 WARN  [io.net.uti.con.DefaultPromise] (vert.x-acceptor-thread-0) An exception was thrown by io.vertx.core.net.impl.TCPServerBase$$Lambda/0x00000070017a0448.operationComplete(): io.vertx.core.impl.NoStackTraceException: HttpBinderConfiguration was not found

Just quarkus create app and then quarkus ext add langchain4j-openai micrometer isn't enough to reproduce, so there must be something else in my reproducer that's creating the conditions for this to happen.

I've put my reproducer here: https://github.com/holly-cummins/quarkus-langchain4j-podman-ai-instructlab/tree/vertx-timeout-reproducer
(ignore the repo name, I – confusingly – removed the config so that it's now using openai.)

@geoand
Copy link
Collaborator

geoand commented Sep 27, 2024

Thanks, I'll have a look on Monday

@geoand
Copy link
Collaborator

geoand commented Sep 30, 2024

Unfortunately I still can't reproduce this :(

@geoand geoand added the bug Something isn't working label Oct 2, 2024
@holly-cummins
Copy link
Contributor Author

holly-cummins commented Oct 22, 2024

I've just double checked my reproducer. I think this may be something we'd want to fix in Quarkus core (if we want to fix it). But see what you think.

This is what I did:

git clone [email protected]:holly-cummins/quarkus-langchain4j-podman-ai-instructlab.git
git checkout vertx-timeout-reproducer
quarkus dev

At this point, if I'm paying attention, I do see the expected configuration validation error, along with these two bits of vertx grumbling:

Configuration validation failed:
	SRCFG00014: The config property quarkus.langchain4j.openai.api-key is required but it could not be found in any config source
__  ____  __  _____   ___  __ ____  ______ 
 --/ __ \/ / / / _ | / _ \/ //_/ / / / __/ 
 -/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \   
--\___\_\____/_/ |_/_/|_/_/|_|\____/___/   
2024-10-21 18:07:43,950 INFO  [io.qua.dep.dev.IsolatedDevModeMain] (main) Attempting to start live reload endpoint to recover from previous Quarkus startup failure

2024-10-21 18:07:43,950 INFO  [io.qua.dep.dev.IsolatedDevModeMain] (main) Attempting to start live reload endpoint to recover from previous Quarkus startup failure
2024-10-21 18:07:44,040 WARN  [io.net.uti.con.DefaultPromise] (vert.x-acceptor-thread-0) An exception was thrown by io.vertx.core.net.impl.TCPServerBase$$Lambda/0x000000f8017ab198.operationComplete(): io.vertx.core.impl.NoStackTraceException: HttpBinderConfiguration was not found

2024-10-21 18:07:44,040 WARN  [io.net.uti.con.DefaultPromise] (vert.x-acceptor-thread-0) An exception was thrown by io.vertx.core.net.impl.TCPServerBase$$Lambda/0x000000f8017ab198.operationComplete(): io.vertx.core.impl.NoStackTraceException: HttpBinderConfiguration was not found

But if I don't notice that, and instead go to http://localhost:8080/, nothing happens for a while, and then I get 3958 lines of vertx Thread blocked exceptions in my log. And at that point, my chances of seeing the original config validation message at the top of the log are low. :/

@geoand
Copy link
Collaborator

geoand commented Oct 22, 2024

Ah, now I see what you mean.

I am not really sure what we could do here TBH

@holly-cummins
Copy link
Contributor Author

Ah, now I see what you mean.

I am not really sure what we could do here TBH

Yeah, the two fixes that come to mind are "tactically suppress the error before it hits the logs," or "rewrite the whole vertx error handling model in Quarkus core so that it doesn't spam the logs in this scenario." The first seems quick but risky, because we could suppress good stuff, and the second seems slow but risky, because it's a fundamental architectural change. It might have broader benefits, though, because I don't think anyone wants ~4k lines of stack trace in their log from one simple error. Some similarity matching on the exceptions, followed by a 'we saw this same exception 200 more times', might be friendly.

@cescoffier, do you have any thoughts?

@cescoffier
Copy link
Collaborator

You cannot implement the second - that won't work.

Now, what is blocking when the API key is not provided? The fact we block on a lock means that something somewhere is holding the lock and "forgot" to release it after the exception is thrown. That's where I would look first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants