Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When an other KER is unavailable, the current KER should not log a lot of CompletionExceptions caused by contacting it anyway. #463

Closed
bnouwt opened this issue Jan 19, 2024 · 3 comments · Fixed by #481
Assignees

Comments

@bnouwt
Copy link
Collaborator

bnouwt commented Jan 19, 2024

When a remote KER becomes unavailable, the SCs it hosts should not be involved in any data exchange. The exception below, shows that the OtherKnowledgeBaseStore tries to retrieve details about a SC that is hosted by a currently unavailable KER.

2024-01-19 05:25:48:072 +0000 [qtp431936140-6924] ERROR OtherKnowledgeBaseStoreImpl-flex-monitoring - An exception has occured while adding an other Knowledge Base 
java.util.concurrent.CompletionException: java.io.IOException: KER https://xxx.xx is currently unavailable. Trying again later.
	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
	at java.base/java.util.concurrent.CompletableFuture.uniAcceptNow(CompletableFuture.java:747)
	at java.base/java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:735)
	at java.base/java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2182)
	at eu.knowledge.engine.smartconnector.impl.OtherKnowledgeBaseStoreImpl.populate(OtherKnowledgeBaseStoreImpl.java:64)
	at eu.knowledge.engine.smartconnector.impl.OtherKnowledgeBaseStoreImpl.knowledgeBaseIdSetChanged(OtherKnowledgeBaseStoreImpl.java:140)
	at eu.knowledge.engine.smartconnector.runtime.messaging.MessageDispatcher.notifySmartConnectorsChanged(MessageDispatcher.java:334)
	at eu.knowledge.engine.smartconnector.runtime.messaging.RemoteKerConnection.updateRemoteKerDataFromPeer(RemoteKerConnection.java:148)
	at eu.knowledge.engine.smartconnector.runtime.messaging.RemoteKerConnection.getRemoteKerDetails(RemoteKerConnection.java:164)
	at eu.knowledge.engine.smartconnector.runtime.messaging.RemoteKerConnection.getRemoteSmartConnectorIds(RemoteKerConnection.java:185)
	at eu.knowledge.engine.smartconnector.runtime.messaging.RemoteKerConnectionManager.getRemoteSmartConnectorIds(RemoteKerConnectionManager.java:218)
	at eu.knowledge.engine.smartconnector.runtime.messaging.MessageDispatcher.getKnowledgeBaseIds(MessageDispatcher.java:290)
	at eu.knowledge.engine.smartconnector.runtime.messaging.MessageDispatcher.deliverToLocalSmartConnector(MessageDispatcher.java:231)
	at eu.knowledge.engine.smartconnector.runtime.messaging.RemoteMessageReceiver.handleMessage(RemoteMessageReceiver.java:40)
	at eu.knowledge.engine.smartconnector.runtime.messaging.RemoteMessageReceiver.messagingAskmessagePost(RemoteMessageReceiver.java:57)
	at eu.knowledge.engine.smartconnector.runtime.messaging.inter_ker.api.MessagingApi.messagingAskmessagePost(MessagingApi.java:88)
	at jdk.internal.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167)
	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:176)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:475)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:397)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
	at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:255)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:234)
	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684)
	at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
	at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:554)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.server.Server.handle(Server.java:516)
	at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
	at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.io.IOException: KER https://xxx.xx is currently unavailable. Trying again later.
	at eu.knowledge.engine.smartconnector.runtime.messaging.RemoteKerConnection.sendToRemoteSmartConnector(RemoteKerConnection.java:289)
	at eu.knowledge.engine.smartconnector.runtime.messaging.MessageDispatcher.sendToLocalOrRemoteSmartConnector(MessageDispatcher.java:211)
	at eu.knowledge.engine.smartconnector.runtime.messaging.LocalSmartConnectorConnection.send(LocalSmartConnectorConnection.java:78)
	at eu.knowledge.engine.smartconnector.impl.MessageRouterImpl.sendAskMessage(MessageRouterImpl.java:88)
	at eu.knowledge.engine.smartconnector.impl.MetaKnowledgeBaseImpl.getOtherKnowledgeBase(MetaKnowledgeBaseImpl.java:324)
	at eu.knowledge.engine.smartconnector.impl.OtherKnowledgeBaseStoreImpl.populate(OtherKnowledgeBaseStoreImpl.java:63)
	... 62 more

This is probably caused by a bug where the RemoteKerConnection.remoteKerDetails is not set to null in the sendToRemoteSmartConnector method when an error occurs and this causes the SCs to be returned when populating resulting a the failed contacted and Exception above.

@bnouwt
Copy link
Collaborator Author

bnouwt commented Jan 23, 2024

This error might prevent the creation of a new Smart Connector in which case it is more impatctful than I thought. This error pops up in the Generic Adapter logs with the following message:

2024-01-22 16:13:27,353 | INFO | Creating the Smart Connector with the knowledge base ... | service_specific_adapter_base.py#738
2024-01-22 16:15:27,409 | ERROR | Creation error: 500 ({"timestamp":"2024-01-22T16:15:27.407+00:00","status":500,"error":"Internal Server Error","message":"","path":"/smartconnector/create"}) | service_specific_adapter_base.py#442
2024-01-22 16:15:27,410 | CRITICAL | Smart Adapter unable to continue | flexibility_service_ssa.py#1041

@bnouwt bnouwt self-assigned this Jan 23, 2024
@bnouwt
Copy link
Collaborator Author

bnouwt commented Feb 2, 2024

After further research, we are able to reproduce errors similar to the ones above using the following procedure:

  • start the unreachable-runtimes example.
  • follow the README.md instructions to make runtime-3 unreachable.
  • wait for runtime-1 to make runtime-3 unavailable (i.e. this takes about 30 seconds due to timeout).
  • once it is unavailable, create a new Smart Connector in runtime-1 via the POST /rest/sc REST operation.
  • inspect the runtime-1 logs to see an exception similar to the one above.

Although it might take a while, the addition of a new Smart Connector succeeds and the KER remains in a correct state despite the exception in the log. This also becomes clear from the code, where in case of an exception the exception is logged and null is returned. This does not interrupt the normal functioning.

@bnouwt
Copy link
Collaborator Author

bnouwt commented Feb 3, 2024

Another development in this issue. The creation of a Smart Connector seems to indeed fail when there are unavailable other KERs (currently investigating why it worked in my tests anyway), because the KnowledgeBase.smartConnectorReady(...) method is never called in some cases. This is caused by the .thenRun(...) calls in the SmartConnectorImpl, because these only get triggered when a future completes normally while when a KER is unavailable it completes exceptionally. So, the following patch might fix this issue:

diff --git a/smart-connector/src/main/java/eu/knowledge/engine/smartconnector/impl/SmartConnectorImpl.java b/smart-connector/src/main/java/eu/knowledge/engine/smartconnector/impl/SmartConnectorImpl.java
index b8b5f63..e7a30e3 100644
--- a/smart-connector/src/main/java/eu/knowledge/engine/smartconnector/impl/SmartConnectorImpl.java
+++ b/smart-connector/src/main/java/eu/knowledge/engine/smartconnector/impl/SmartConnectorImpl.java
@@ -480,14 +480,14 @@
 		Instant beforePopulate = Instant.now();
 		LOG.info("Getting comms ready took {} ms", Duration.between(this.started, beforePopulate).toMillis());
 		// Populate the initial knowledge base store.
-		this.otherKnowledgeBaseStore.populate().thenRun(() -> {
+		this.otherKnowledgeBaseStore.populate().handle((r, e) -> {
 			LOG.info("Populating took {} ms", Duration.between(beforePopulate, Instant.now()).toMillis());
 			Instant beforeAnnounce = Instant.now();
 			// Then tell the other knowledge bases about our existence.
-			this.metaKnowledgeBase.postNewKnowledgeBase().thenRun(() -> {
+			this.metaKnowledgeBase.postNewKnowledgeBase().handle((r2, e2) -> {
 				LOG.info("Announcing took {} ms", Duration.between(beforeAnnounce, Instant.now()).toMillis());
 				Instant beforeConstructorFinished = Instant.now();
-				this.constructorFinished.thenRun(() -> {
+				this.constructorFinished.handle((r3, e3) -> {
 					LOG.info("Constructor finished took {} ms",
 							Duration.between(beforeConstructorFinished, Instant.now()).toMillis());
 					// When that is done, and all peers have acknowledged our existence, we
@@ -500,8 +500,11 @@
 							LOG.error("KnowledgeBase threw exception", t);
 						}
 					});
+					return (Void) null;
 				});
+				return (Void) null;
 			});
+			return (Void) null;
 		}).exceptionally((Throwable t) -> {
 			LOG.error("Populating the Smart Connector should not result in errors.", t);
 			return null;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant