Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Reconnecting Nodes #19

Open
ayushr2 opened this issue Jan 29, 2020 · 2 comments
Open

Test Reconnecting Nodes #19

ayushr2 opened this issue Jan 29, 2020 · 2 comments
Assignees

Comments

@ayushr2
Copy link
Collaborator

ayushr2 commented Jan 29, 2020

We should test that when a node reconnects, their info is sustained and they can continue from where they left.

@zhengyao-lin
Copy link
Member

zhengyao-lin commented Feb 7, 2020

There is one scenario in the websocket version of the protocol that's currently problematic:

When API restarts, graders will be interrupted with a disconnection exception. If there are ongoing jobs in graders in the restarting period, API will think the grader is still working on the original job (since in the http protocol graders would still continue the job and submit in this case). The "running" flag of such grader is not cleared in API once the restart is done.

I think we should probably:

  1. add reconnecting mechanism in the websocket version of grader instead of relying entirely on docker (and try to preserve the status of an on-going job as long as possible)
  2. add some kind of draining mechanism in API to temporarily block incoming grading requests when we are expecting a restart to happen (e.g. deploying new version)

@ayushr2
Copy link
Collaborator Author

ayushr2 commented Feb 7, 2020

Nice idea. We would need to drain the queue before we can shutdown the API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants