Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task execution time should have an upper limit in pipeline model #9433

Open
windtalker opened this issue Sep 14, 2024 · 3 comments · May be fixed by #9689
Open

task execution time should have an upper limit in pipeline model #9433

windtalker opened this issue Sep 14, 2024 · 3 comments · May be fixed by #9689
Labels
affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.5 This bug affects the 8.5.x(LTS) versions. component/compute severity/major type/bug The issue is confirmed as a bug.

Comments

@windtalker
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

In pipeline model, when task is scheduled, it will be executed in fixed size thread pool, so for each task, the per-round execution time must have an upper limit, otherwise, it could block all the other queries.
Normally, the per-round exectuion time should be predictable and short enough because it is by design that in each round of exection, the task will only process one block of data(normally less than 60k rows), so we can just assume it already has a reasonable upper limit, but still in some special cases, some operators may take very very long time to process one block of the data, we need to handle this explicitly. Special attention should be spent on the following kind of operators:

  • HashJoinProbeTransformOp

This issue aims to identify and improve operators that may seriously affect the normal operation of the system. For other operators, since their timing is predictable, they are not the focus of this issue.

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiFlash version? (Required)

@SeaRise
Copy link
Contributor

SeaRise commented Sep 18, 2024

In velox, the operator gets task->execute_time to determine whether it has timed out.
Perhaps we can put the Task pointer in thread_local to get the execution time

@windtalker
Copy link
Contributor Author

In velox, the operator gets task->execute_time to determine whether it has timed out. Perhaps we can put the Task pointer in thread_local to get the execution time

But we can not simply return error if execution time exceed the limit?

@SeaRise
Copy link
Contributor

SeaRise commented Sep 18, 2024

In velox, the operator gets task->execute_time to determine whether it has timed out. Perhaps we can put the Task pointer in thread_local to get the execution time

But we can not simply return error if execution time exceed the limit?

Ah, yes, the need for a join probe that can return after probing a portion of the data (more thorough than before

@ti-chi-bot ti-chi-bot bot added the affects-8.5 This bug affects the 8.5.x(LTS) versions. label Nov 1, 2024
ti-chi-bot bot pushed a commit that referenced this issue Nov 26, 2024
ti-chi-bot bot pushed a commit that referenced this issue Nov 27, 2024
@windtalker windtalker mentioned this issue Nov 28, 2024
12 tasks
ti-chi-bot bot pushed a commit that referenced this issue Dec 3, 2024
@windtalker windtalker linked a pull request Dec 3, 2024 that will close this issue
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.5 This bug affects the 8.5.x(LTS) versions. component/compute severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants