-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTX D-Fine Detection Algorithm Integration #4142
base: develop
Are you sure you want to change the base?
OTX D-Fine Detection Algorithm Integration #4142
Conversation
…nd reorganizing imports
…s, and updating documentation
…es, and enhancing documentation for RandomIoUCrop
…tructure and updating type hints in DFINECriterion
…ng parameter names for consistency
…ion documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, Eugene for your great contribution!
I will try D-Fine from your branch with Intel GPUs
return output.permute(0, 2, 1) | ||
|
||
|
||
class MSDeformableAttentionV2(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use this for RTDetr as well? Maybe it will be upgrade for RTDetrV2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Secondly, I would rather put it to otx/src/otx/algo/common/layers/transformer_layers.py as done for RTDetr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kprokofi yes, we can use it for RTDetrV2. I moved it to otx/src/otx/algo/common/layers/transformer_layers.py
|
||
PRETRAINED_ROOT: str = "https://github.com/Peterande/storage/releases/download/dfinev1.0/" | ||
|
||
PRETRAINED_WEIGHTS: dict[str, str] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we need all of these variants? We are currently overwhelmed with detection recipes. Could we choose maybe 2 models to expose and omit others? The largest one shows the best performance and it is a candidate for Geti largest template revamp, but other templates seems to be not so beneficial comparing with already introduced models.
So, I would consider cleaning some model versions here (same concerns RTDetr and YOLOX, but it is another story)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest removing the three recipes (D-Fine tiny/small/medium) but keeping their configurations in d_fine.py. This way we can reintroduce those models base on user requests, or if there are future improvements to the pre-trained models. Also, removing the recipes will reduce the load on our CI pipeline.
) | ||
|
||
|
||
def distance2bbox(points: Tensor, distance: Tensor, reg_scale: Tensor) -> Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe put this to utils?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved D-Fine utility functions under: src/otx/algo/detection/utils/utils.py
class HybridEncoderModule(nn.Module): | ||
"""HybridEncoder for DFine. | ||
|
||
TODO(Eugene): Merge with current rtdetr.HybridEncoderModule in next PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -3921,3 +3921,44 @@ def _dispatch_transform(cls, cfg_transform: DictConfig | dict | tvt_v2.Transform | |||
raise TypeError(msg) | |||
|
|||
return transform | |||
|
|||
|
|||
class RandomIoUCrop(tvt_v2.RandomIoUCrop): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used torchvision.RandomIOUCrop
to align with the original implementation. I also tested it with mmdet.MinIoURandomCrop
and observed no significant differences in accuracy or performance.
I suggest removing mmdet.MinIoURandomCrop
and using torchvision.RandomIOUCrop to reduce the code maintenance overhead.
…fine_bbox2distance in DFINECriterion
Summary
OTX D-Fine Detection Algorithm Integration: https://github.com/Peterande/D-FINE
Next phase
How to test
otx train --config src/otx/recipe/detection/dfine_x.yaml --data_root DATA_ROOT
pytest tests/unit/algo/detection/test_dfine.py
Checklist
License
Feel free to contact the maintainers if that's a concern.