Multi Scale Deformable Attention Support

### 🚀 The feature, motivation and pitch

Multi-scale deformable attention has gained traction in many recent birdseye view and 3d model papers. It provides a lot of performance improvements over doing full attention as it samples a subset of the possible queries rather than computing attention across all keys.

Example Papers:
* https://arxiv.org/abs/2010.04159
* https://arxiv.org/abs/2302.12251
* https://arxiv.org/abs/2203.17270

There's a handful of fragmented implementations available. It would be great to have this be usptreamed to PyTorch core given the number of papers using it now.

These existing implementations have a lot of issues such as not supporting different data types and not using the torch ops registration.

Example Implementations
* https://github.com/open-mmlab/mmcv/blob/1.x/mmcv/ops/multi_scale_deform_attn.py#L23
* https://github.com/fundamentalvision/Deformable-DETR/blob/main/models/ops/functions/ms_deform_attn_func.py

The existing implementations are licensed under Apache 2.0 -- is it possible to upstream as is or would it require a complete rewrite or relicensing under BSD to match PT core?

### Alternatives

Installing a third party library from source or using mmcv which has many many dependencies

### Additional context

_No response_

cc @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki @bhosmer @cpuhrsch @erichan1 @drisspg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi Scale Deformable Attention Support #112827

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi Scale Deformable Attention Support #112827

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions