Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid long tail tasks due to PrioritySemaphore #11574

Merged

Conversation

binmahone
Copy link
Collaborator

@binmahone binmahone commented Oct 9, 2024

This PR fixes #11573 by adding a tie breaker using task id.

Long tail tasks disappeared after the fix:

image

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
jlowe
jlowe previously approved these changes Oct 9, 2024
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with a suggestion on how to avoid an extra TaskContext lookup. Not mustfix.

abellina
abellina previously approved these changes Oct 9, 2024
@jlowe
Copy link
Member

jlowe commented Oct 9, 2024

This seems like a fix we may want to consider for 24.10 given it can effectively disable one or more executor cores during the long starvation. cc: @sameerz @revans2

Copy link
Collaborator

@zpuller zpuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch on this issue!

new PriorityQueue[ThreadInfo](Ordering.by[ThreadInfo, T](_.priority).reverse)
new PriorityQueue[ThreadInfo](
// use task id as tie breaker when priorities are equal (both are 0 because never hold lock)
Ordering.by[ThreadInfo, T](_.priority).reverse.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we write this as

Ordering.by[ThreadInfo, T](t => (t.priority, t.taskId)).reverse

(technically this would flip the taskId comparison but I don't think we care)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the argument for wanting to be more explicit with thenComparing so that's totally fine too

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it matters ? We hope tasks with smaller taskid could have higher priority, so that we can avoid the very long tasks spanning from the start of stage to end of stage.

zpuller
zpuller previously approved these changes Oct 9, 2024
@binmahone binmahone dismissed stale reviews from zpuller, abellina, and jlowe via 2bebd2b October 10, 2024 02:17
@binmahone
Copy link
Collaborator Author

build

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
@binmahone binmahone force-pushed the 241009_avoid_semaphore_long_tail branch from 2bebd2b to 9956fd9 Compare October 10, 2024 02:36
@binmahone
Copy link
Collaborator Author

build

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jlowe that this would be a good fix for 24.10

@binmahone binmahone merged commit e8ac073 into NVIDIA:branch-24.12 Oct 10, 2024
45 checks passed
// use task id as tie breaker when priorities are equal (both are 0 because never hold lock)
Ordering.by[ThreadInfo, T](_.priority).reverse.
thenComparing((a, b) => a.taskId.compareTo(b.taskId))
)

def tryAcquire(numPermits: Int, priority: T): Boolean = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs a taskAttemptId and updates to the ordering comparison below otherwise the algorithm we're using for tryAcquire doesn't match the algorithm being used for waiting queue ordering (although it's very close). For example, a task with priority 0 and task attempt ID 2 with 5 permits will block a task with priority 0 and task attempt ID 1 with 2 permits, even if the semaphore had 4 permits available.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's a very corner case I did't pay attention to. Since your comment is after my merge action, I have submitted another PR to fix this: https://github.com/NVIDIA/spark-rapids/pull/11587/files. BTW, Is there any real cases that we'll have different permits for different threads?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any real cases that we'll have different permits for different threads?

Yes, because the concurrent GPU tasks config can be updated at runtime, and that changes the number of permits for subsequent tasks. See GpuSemaphore.computeNumPermits.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize "concurrent GPU tasks config can be updated at runtime", thanks !

binmahone added a commit to binmahone/spark-rapids that referenced this pull request Oct 11, 2024
* use task id as tie breaker

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* save threadlocal lookup

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

---------

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
binmahone added a commit that referenced this pull request Oct 11, 2024
* avoid long tail tasks due to PrioritySemaphore (#11574)

* use task id as tie breaker

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* save threadlocal lookup

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

---------

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* addressing jason's comment

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

---------

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
@sameerz sameerz added bug Something isn't working performance A performance related task/issue labels Oct 11, 2024
firestarman pushed a commit to firestarman/spark-rapids that referenced this pull request Oct 21, 2024
* use task id as tie breaker

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* save threadlocal lookup

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

---------

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
firestarman pushed a commit to firestarman/spark-rapids that referenced this pull request Oct 21, 2024
…didate'

avoid long tail tasks due to PrioritySemaphore (NVIDIA#11574)

See merge request nvspark/bd-spark-rapids!42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] very long tail task is observed when many tasks are contending for PrioritySemaphore
6 participants