Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Kudo shuffle read should be retryable and spillable on Host Memory #12215

Open
binmahone opened this issue Feb 25, 2025 · 0 comments
Open
Labels
? - Needs Triage Need team to review and classify feature request New feature or request

Comments

@binmahone
Copy link
Collaborator

binmahone commented Feb 25, 2025

Is your feature request related to a problem? Please describe.

Shuffle read is a heavy consumer of host memory. For a executor with 16 cores, it will take up to 16 * avg_shuffle_partition_size * 2 bytes. It is multipled by 2 because in the concat step, all the small pieces are concatenated into a big batch, and they both reside in host memory.

After #12169 #12158 #12184 are all checked in, we should be able to make kudo shuffle read retryable and spillable.

Another perspective for optimizing host memory usage is reconsider: Do we really need 16 threads doing shuffle read at the same time? After all concurrent GPU task number is typically 2~4, which means most tasks will not be scheduled even if it has finished shuffle read and await executing. Maybe we should start with 16/2=8 threads doing the shuffle read, other threads should not start shuffle reading to avoid wasting host memory. But this is out the scope of this issue

@binmahone binmahone added ? - Needs Triage Need team to review and classify feature request New feature or request labels Feb 25, 2025
@binmahone binmahone changed the title [FEA] Kudo shuffle read should be retryable and spillable [FEA] Kudo shuffle read should be retryable and spillable on Host Memory Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify feature request New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant