Test Reconnecting Nodes #19

ayushr2 · 2020-01-29T04:35:26Z

We should test that when a node reconnects, their info is sustained and they can continue from where they left.

zhengyao-lin · 2020-02-07T00:39:29Z

There is one scenario in the websocket version of the protocol that's currently problematic:

When API restarts, graders will be interrupted with a disconnection exception. If there are ongoing jobs in graders in the restarting period, API will think the grader is still working on the original job (since in the http protocol graders would still continue the job and submit in this case). The "running" flag of such grader is not cleared in API once the restart is done.

I think we should probably:

add reconnecting mechanism in the websocket version of grader instead of relying entirely on docker (and try to preserve the status of an on-going job as long as possible)
add some kind of draining mechanism in API to temporarily block incoming grading requests when we are expecting a restart to happen (e.g. deploying new version)

ayushr2 · 2020-02-07T21:59:00Z

Nice idea. We would need to drain the queue before we can shutdown the API.

ayushr2 assigned ezhang887 Jan 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Reconnecting Nodes #19

Test Reconnecting Nodes #19

ayushr2 commented Jan 29, 2020

zhengyao-lin commented Feb 7, 2020 •

edited

Loading

ayushr2 commented Feb 7, 2020

Test Reconnecting Nodes #19

Test Reconnecting Nodes #19

Comments

ayushr2 commented Jan 29, 2020

zhengyao-lin commented Feb 7, 2020 • edited Loading

ayushr2 commented Feb 7, 2020

zhengyao-lin commented Feb 7, 2020 •

edited

Loading