Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor job status query logic as per Tower (seqera platform) integration. #91

Merged
merged 3 commits into from
Dec 8, 2024

Conversation

abhi18av
Copy link
Member

This work follows up on #77

@jagedn
Copy link
Collaborator

jagedn commented Oct 29, 2024

did you be able to check if it works? any issue?

@abhi18av
Copy link
Member Author

Hi @jagedn , yeah I did a quick check and in my case it failed with the Timeout error for a couple of test related to join query logic.


NomadDSLSpec > should submit a job FAILED
    org.spockframework.runtime.SpockTimeoutError at NomadDSLSpec.groovy:145

NomadServiceSpec > should check the state FAILED
    org.spockframework.runtime.SpockComparisonFailure at NomadServiceSpec.groovy:78


The log file basically mentioned


11:51:00.030 [Task submitter] DEBUG nextflow.nomad.executor.NomadTaskHandler -- [NOMAD] Submitted task sayHello (1) with taskId=nf-d5244c1e551da4ec49f906d1d42c9c4a-sayHello_1
11:51:00.031 [Task submitter] INFO nextflow.Session -- [d5/244c1e] Submitted process > sayHello (1)
11:51:04.466 [Task monitor] DEBUG nextflow.nomad.executor.NomadService -- [NOMAD] Failed to get jobState nf-d5244c1e551da4ec49f906d1d42c9c4a-sayHello_1 -- Cause: java.lang.IllegalStateException: Expected BEGIN_ARRAY but was BEGIN_OBJECT at line 1 column 2 path $
com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected BEGIN_ARRAY but was BEGIN_OBJECT at line 1 column 2 path $
	at com.google.gson.Gson.fromJson(Gson.java:1238)
	at com.google.gson.Gson.fromJson(Gson.java:1137)
	at com.google.gson.Gson.fromJson(Gson.java:1047)
	at com.google.gson.Gson.fromJson(Gson.java:1014)
	at io.nomadproject.client.JSON.deserialize(JSON.java:146)
	at io.nomadproject.client.ApiClient.deserialize(ApiClient.java:795)
	at io.nomadproject.client.ApiClient.handleResponse(ApiClient.java:1001)
	at io.nomadproject.client.ApiClient.execute(ApiClient.java:925)
	at io.nomadproject.client.api.JobsApi.getJobAllocationsWithHttpInfo(JobsApi.java:629)
	at io.nomadproject.client.api.JobsApi.getJobAllocations(JobsApi.java:596)
	at nextflow.nomad.executor.NomadService$_getTaskState_closure2.doCall(NomadService.groovy:120)
	at nextflow.nomad.executor.NomadService$_getTaskState_closure2.doCall(NomadService.groovy)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:343)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:328)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:279)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1007)
	at groovy.lang.Closure.call(Closure.java:433)
	at org.codehaus.groovy.runtime.ConvertedClosure.invokeCustom(ConvertedClosure.java:52)
	at org.codehaus.groovy.runtime.ConversionHandler.invoke(ConversionHandler.java:113)
	at jdk.proxy3/jdk.proxy3.$Proxy43.get(Unknown Source)
	at dev.failsafe.Functions.lambda$toCtxSupplier$11(Functions.java:236)
	at dev.failsafe.Functions.lambda$get$0(Functions.java:46)
	at dev.failsafe.internal.RetryPolicyExecutor.lambda$apply$0(RetryPolicyExecutor.java:75)
	at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:176)
	at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:437)
	at dev.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:115)
	at nextflow.nomad.executor.FailsafeExecutor.apply(FailsafeExecutor.groovy:71)
	at nextflow.nomad.executor.NomadService.getTaskState(NomadService.groovy:119)
	at nextflow.nomad.executor.NomadTaskHandler.taskState0(NomadTaskHandler.groovy:195)
	at nextflow.nomad.executor.NomadTaskHandler.checkIfRunning(NomadTaskHandler.groovy:79)
	at nextflow.processor.TaskPollingMonitor.checkTaskStatus(TaskPollingMonitor.groovy:642)
	at nextflow.processor.TaskPollingMonitor.checkAllTasks(TaskPollingMonitor.groovy:571)
	at nextflow.processor.TaskPollingMonitor.pollLoop(TaskPollingMonitor.groovy:441)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:343)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:328)
	at groovy.lang.MetaClassImpl.doInvokeMethod(MetaClassImpl.java:1333)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1088)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1007)
	at org.codehaus.groovy.runtime.InvokerHelper.invokePogoMethod(InvokerHelper.java:645)
	at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:628)
	at org.codehaus.groovy.runtime.InvokerHelper.invokeMethodSafe(InvokerHelper.java:82)
	at nextflow.processor.TaskPollingMonitor$_start_closure2.doCall(TaskPollingMonitor.groovy:316)
	at nextflow.processor.TaskPollingMonitor$_start_closure2.call(TaskPollingMonitor.groovy)
	at groovy.lang.Closure.run(Closure.java:505)
	at java.base/java.lang.Thread.run(Thread.java:1589)

I think the terminology and status mapping is still a bit off but I plan to do this after the presentation. Still a bit nervous about it 😅

@jagedn
Copy link
Collaborator

jagedn commented Oct 30, 2024

imagen

@abhi18av
Copy link
Member Author

Hi @jagedn ,

Just adding a comment here quickly that I haven't forgotten about this PR, just that I want to test it on the SUN-nomadlab cluster rather than only my machine where it seems to work now.

As soon as I'm on stable internet again by end of this week, I will take this check forward and circle back.

@abhi18av abhi18av marked this pull request as ready for review December 8, 2024 19:01
@abhi18av abhi18av linked an issue Dec 8, 2024 that may be closed by this pull request
@abhi18av
Copy link
Member Author

abhi18av commented Dec 8, 2024

Hi @jagedn , I have just tested this PR with our sun-nomadlab cluster and everything seems to be working perfectly.

  1. ✅ In addition to the live updates on Tower
image
  1. ✅ Plus the individual stats are shared with Tower
image
  1. ✅ And the pipeline runtime report is correctly showing the job durations in execution timelines
image

@abhi18av abhi18av merged commit 86ca369 into master Dec 8, 2024
2 of 3 checks passed
@abhi18av abhi18av deleted the tower-nf branch December 8, 2024 19:07
@jagedn
Copy link
Collaborator

jagedn commented Dec 9, 2024

lets-go-ready.mp4

@abhi18av abhi18av restored the tower-nf branch December 10, 2024 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve job-status query logic and nf-tower support
2 participants