Skip to content

Evaluate non-alignment of kernel implementation between IPEX and stock CUDA #126

@fengyuan14

Description

@fengyuan14

🚀 The feature, motivation and pitch

We have some kernels not aligned with stock CUDA implementation, since,

  1. Functionality extension was added in stock CUDA implementation. But we have no sustainable rebase.
  2. General memory layout support was added in stock CUDA implementation. But we have no sustainable rebase.
  3. We have specific implementation for performance in some cases. But stock CUDA don't care these cases.

1 is functionality related and 2 and 3 are performance related.

  • For Type-1, we should fix and aligned with stock CUDA during porting from IPEX to torch-xpu-ops.
  • For Type-2, we should align with CUDA implementation with proper priority.
  • For Type-3, we need to trad-off performance and feasibility of in-tree.

Here is the list. We will add items gradually when op is ported.

  • aten::bernoulli_ // Type-2
  • aten::cumsum // Type-3
  • atem::cat @xytintel // Type-3
  • aten::tril/triu @AlienLiang23 // Type-2 CUDA optimization commit: 1462d72904cb81917b9355d6a58916f389e9084c

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions