-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avoid long tail tasks due to PrioritySemaphore #11574
avoid long tail tasks due to PrioritySemaphore #11574
Conversation
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with a suggestion on how to avoid an extra TaskContext lookup. Not mustfix.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/PrioritySemaphore.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch on this issue!
new PriorityQueue[ThreadInfo](Ordering.by[ThreadInfo, T](_.priority).reverse) | ||
new PriorityQueue[ThreadInfo]( | ||
// use task id as tie breaker when priorities are equal (both are 0 because never hold lock) | ||
Ordering.by[ThreadInfo, T](_.priority).reverse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could we write this as
Ordering.by[ThreadInfo, T](t => (t.priority, t.taskId)).reverse
(technically this would flip the taskId comparison but I don't think we care)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the argument for wanting to be more explicit with thenComparing
so that's totally fine too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it matters ? We hope tasks with smaller taskid could have higher priority, so that we can avoid the very long tasks spanning from the start of stage to end of stage.
build |
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
2bebd2b
to
9956fd9
Compare
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jlowe that this would be a good fix for 24.10
// use task id as tie breaker when priorities are equal (both are 0 because never hold lock) | ||
Ordering.by[ThreadInfo, T](_.priority).reverse. | ||
thenComparing((a, b) => a.taskId.compareTo(b.taskId)) | ||
) | ||
|
||
def tryAcquire(numPermits: Int, priority: T): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs a taskAttemptId and updates to the ordering comparison below otherwise the algorithm we're using for tryAcquire doesn't match the algorithm being used for waiting queue ordering (although it's very close). For example, a task with priority 0 and task attempt ID 2 with 5 permits will block a task with priority 0 and task attempt ID 1 with 2 permits, even if the semaphore had 4 permits available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it's a very corner case I did't pay attention to. Since your comment is after my merge action, I have submitted another PR to fix this: https://github.com/NVIDIA/spark-rapids/pull/11587/files. BTW, Is there any real cases that we'll have different permits for different threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any real cases that we'll have different permits for different threads?
Yes, because the concurrent GPU tasks config can be updated at runtime, and that changes the number of permits for subsequent tasks. See GpuSemaphore.computeNumPermits
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize "concurrent GPU tasks config can be updated at runtime", thanks !
* use task id as tie breaker Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * save threadlocal lookup Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
* avoid long tail tasks due to PrioritySemaphore (#11574) * use task id as tie breaker Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * save threadlocal lookup Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * addressing jason's comment Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
* use task id as tie breaker Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * save threadlocal lookup Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
…didate' avoid long tail tasks due to PrioritySemaphore (NVIDIA#11574) See merge request nvspark/bd-spark-rapids!42
This PR fixes #11573 by adding a tie breaker using task id.
Long tail tasks disappeared after the fix: