Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve aggressive factor propagation strategy in Shardy. There are two main differences from BasicFactorPropagation. #28

Merged
merged 1 commit into from
Jul 30, 2024

Conversation

copybara-service[bot]
Copy link

@copybara-service copybara-service bot commented Jul 25, 2024

Improve aggressive factor propagation strategy in Shardy. There are two main differences from BasicFactorPropagation.

Difference 1

BasicFactorPropagation propagates the same sharding axes to all the tensors
along a factor. This strategy can propagate different sharding axes to
different tensors. For example, Tensors T0, T1, T2 contains Factor F0. T0/F0
is already sharded along ["a", "b"], and "b" is already used by T2 ("b" can be
explicitly replicated, or it is used to shard another factor).
BasicFactorPropagation propagates ["a"] to both T1/F0 and T2/F0, while this
strategy propagates ["a", "b"] to T1/F0 and ["a"] to T2/F0, respectively.

Difference 2

BasicFactorPropagation is conservative in terms of conflicts across
factors. The overlapped axis between factors cannot be propagated. This
strategy is more aggressive by allowing the overlapped axis being propagated
along different factors if there is no overlapped axis in the result
shardings.

…wo main differences from `BasicFactorPropagation`.

### Difference 1
`BasicFactorPropagation` propagates the same sharding axes to all the tensors
along a factor. This strategy can propagate different sharding axes to
different tensors. For example, Tensors T0, T1, T2 contains Factor F0. T0/F0
is already sharded along ["a", "b"], and "b" is already used by T2 ("b" can be
explicitly replicated, or it is used to shard another factor).
`BasicFactorPropagation` propagates ["a"] to both T1/F0 and T2/F0, while this
strategy propagates ["a", "b"] to T1/F0 and ["a"] to T2/F0, respectively.

### Difference 2
`BasicFactorPropagation` is conservative in terms of conflicts across
factors. The overlapped axis between factors cannot be propagated. This
strategy is more aggressive by allowing the overlapped axis being propagated
along different factors if there is no overlapped axis in the result
shardings.

PiperOrigin-RevId: 657641564
@copybara-service copybara-service bot merged commit 2eeee20 into main Jul 30, 2024
@copybara-service copybara-service bot deleted the test_655675663 branch July 30, 2024 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant