Evaluate non-alignment of kernel implementation between IPEX and stock CUDA

### 🚀 The feature, motivation and pitch

We have some kernels not aligned with stock CUDA implementation, since,
1. Functionality extension was added in stock CUDA implementation. But we have no sustainable rebase.
2. General memory layout support was added in stock CUDA implementation. But we have no sustainable rebase.
3. We have specific implementation for performance in some cases. But stock CUDA don't care these cases.

1 is functionality related and 2 and 3 are performance related.
- For Type-1, we should fix and aligned with stock CUDA during porting from IPEX to torch-xpu-ops.
- For Type-2, we should align with CUDA implementation with proper priority.
- For Type-3, we need to trad-off performance and feasibility of in-tree.

Here is the list. We will add items gradually when op is ported.
- [x] aten::bernoulli_ // Type-2
- [ ] aten::cumsum // Type-3
- [ ] atem::cat @xytintel // Type-3
- [ ] aten::tril/triu @AlienLiang23 // Type-2 CUDA optimization commit: 1462d72904cb81917b9355d6a58916f389e9084c


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluate non-alignment of kernel implementation between IPEX and stock CUDA #126

🚀 The feature, motivation and pitch

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Evaluate non-alignment of kernel implementation between IPEX and stock CUDA #126

Description

🚀 The feature, motivation and pitch

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions