Improve aggressive factor propagation strategy in Shardy. There are two main differences from BasicFactorPropagation
.
#28
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Improve aggressive factor propagation strategy in Shardy. There are two main differences from
BasicFactorPropagation
.Difference 1
BasicFactorPropagation
propagates the same sharding axes to all the tensorsalong a factor. This strategy can propagate different sharding axes to
different tensors. For example, Tensors T0, T1, T2 contains Factor F0. T0/F0
is already sharded along ["a", "b"], and "b" is already used by T2 ("b" can be
explicitly replicated, or it is used to shard another factor).
BasicFactorPropagation
propagates ["a"] to both T1/F0 and T2/F0, while thisstrategy propagates ["a", "b"] to T1/F0 and ["a"] to T2/F0, respectively.
Difference 2
BasicFactorPropagation
is conservative in terms of conflicts acrossfactors. The overlapped axis between factors cannot be propagated. This
strategy is more aggressive by allowing the overlapped axis being propagated
along different factors if there is no overlapped axis in the result
shardings.