Skip to content

Reduction: Enhance reduction kernel with supporting data type dynamic cast #685

@fengyuan14

Description

@fengyuan14

🚀 The feature, motivation and pitch

It is a performance requirement.
The existing CUDA implementation in PyTorch supports data type dynamic cast, so that there won't be an extra kernel to align data types of input and output.

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions