Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query scheduler does not remove failed extraction job in its internal queue. #605

Open
haiqi96 opened this issue Nov 20, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@haiqi96
Copy link
Contributor

haiqi96 commented Nov 20, 2024

Bug

The current query scheduler use two dictionaries to map a stream_id to a job.

When a stream extraction job fails, the query scheduler is supposed to remove the job from the dictionaries and notify the failure to other jobs waiting on the same stream_id.

However, if an exception is throw by the worker (see here), the scheduler will simply continue without cleaning up the entires in the dictionarires. This will cause an issue if the following sequence happens:

  1. Webui submits a stream extraction job 0 with stream ID: X
  2. the job 0 fails due to an exception in the worker.
  3. Webui submits another stream extraction job 1 with the same stream ID: x
  4. the query scheduler sees the entry {X: [0]} is still in the dictionary, so it thought the job 0 is still running (but it actually failed)
  5. the query scheduler then assigns job 1 to be running without submitting it to worker, and keep waiting on job 0 (which will never return beause it has already failed).

CLP version

ee7e493

Environment

Ubuntu jammy

Reproduction steps

Described in the Bug description

@haiqi96 haiqi96 added the bug Something isn't working label Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant