[Failover][BugFix] Fix duplicate requests caused by failover #18
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, when Llumnix finds manager/instance unavailable when generating a request, the request is regenerated because Llumnix cannot know whether the request is successfully dispatched into instance. However, this could bring duplicate requests in instances. Duplicate requests mean that two requests will continuously output to the user at the same time, which is quite terrible behavior. However, duplicate requests are hard to detect and intercept by api server unless comparing the output prompts of requests. But this will cause innegligible cost in the critical path of processing outputs. Considering the low possibility of manager/instance unavailable when generating a request, we choose not to re-generate the request, and user can find the request failed when timeout, which is consistent with other failover cases.