Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed analyses do not update as "failed" in the UI #32

Open
ablack3 opened this issue Jan 18, 2024 · 7 comments · Fixed by #58
Open

Failed analyses do not update as "failed" in the UI #32

ablack3 opened this issue Jan 18, 2024 · 7 comments · Fixed by #58
Assignees
Labels
bug Something isn't working priority
Milestone

Comments

@ablack3
Copy link
Collaborator

ablack3 commented Jan 18, 2024

As soon as it is clear that the R code has failed to run the datanode UI should update to reflect this (i.e. execution failed). Instead what I currently see is that analyses will fail which is clear from the docker logs of the datanode container but the UI still says "executing" for quite a long time.

@ablack3 ablack3 changed the title Failed analyses do not update in the UI Failed analyses do not update as "failed" in the UI Jan 18, 2024
@konstjar
Copy link
Contributor

konstjar commented Jan 19, 2024

3 cases here:

  • DataNode was not able to reach Execution Engine and submit analysis. In this case the status of analysis should be marked as "Failed".

  • There is no response from Execution Engine about the status of analysis for a long time. In case callback configuration was not set correctly and Execution Engine was not able to sent callback. In this case we need to invalidate the job after some time. E.g, if there is no response from Execution Engine during a 1 hour, it means the fob was failed. 1 hour should be parametrised.

  • We need to invalidate and marked as "FAILED" all jobs that are in Executing state during the Data Node restart.
    Executing/Aborting => Failed

@konstjar konstjar added the bug Something isn't working label Jan 25, 2024
@konstjar konstjar added this to the v2.0.1 milestone Jan 25, 2024
@konstjar konstjar modified the milestones: v2.0.1, v2.0.2 Feb 27, 2024
@dmitrys-odysseus
Copy link
Contributor

dmitrys-odysseus commented Jul 24, 2024

The following adjustments are to be made:

  1. On the execution engine side, provide new endpoint /api/v1/status?id=1&id=2... to query status of the running analyses by id.
  2. Regularly call this endpoint to update status for incomplete analysis.
  3. On the datanode, the endpoint /api/v1/admin/submissions is to feature a new engine field to report engine status as follows:
    {
        status: "OK | ERROR | UNKNOWN",
        since: <timestamp when the status was seen first time>
        seenLast:  <timestamp when the status was seen last time>;
        error: <error message if any>
    }
  1. Frontend to be updated to show a clear visual indication when execution engine status is "ERROR" and the error message. Whether other statuses are to be displayed somehow is TBD

dmitrys-odysseus pushed a commit that referenced this issue Jul 24, 2024
dmitrys-odysseus pushed a commit that referenced this issue Jul 25, 2024
dmitrys-odysseus pushed a commit that referenced this issue Jul 25, 2024
dmitrys-odysseus pushed a commit that referenced this issue Jul 25, 2024
dmitrys-odysseus added a commit that referenced this issue Jul 25, 2024
#32 Provide endpoint for engine status and update analysis status when disconnected engine reconnects
@ablack3
Copy link
Collaborator Author

ablack3 commented Jul 31, 2024

I'm testing the latest release. I can see in the docker logs for arachne execution engine that my analysis has failed and is no longer running. However from the Arachne datanode UI it appears as if the analysis is still running so the user has no idea the code has failed. The timer for the study continues to run as well. Also the logs in the UI are blank so there is no way to diagnose the problem without digging into docker logs which our users might not be able to do.

I suggest we print some messages to the log in the Arachne UI to let the user know what is happening.

@konstjar
Copy link
Contributor

I may assume that it is fixed by this commit a5e19eb
Let's wait for ARACHNE Datanode release and verify it.

@ablack3
Copy link
Collaborator Author

ablack3 commented Jul 31, 2024

Can we also print a message in the log that indicates that the docker image is being pulled, or was found locally, and that the analysis is starting up? I think there should be a couple log messages prior to the R code running just to let us know that the environment is working.

@konstjar
Copy link
Contributor

I like the suggestion.

@konstjar
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants