Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTX D-Fine Detection Algorithm Integration #4142

Open
wants to merge 56 commits into
base: develop
Choose a base branch
from

Conversation

eugene123tw
Copy link
Contributor

@eugene123tw eugene123tw commented Dec 4, 2024

Summary

OTX D-Fine Detection Algorithm Integration: https://github.com/Peterande/D-FINE

  • Introduced five variants of the D-Fine detection algorithm.
  • Integrated the HGNetv2 backbone from PaddleDetection.
  • Cleaned and optimized the original codebase by:
    • Reducing code duplication where possible.
    • Adding docstrings for all methods and functions.
    • Benchmarking OpenVINO/PyTorch detection results for accuracy and performance.

Next phase

  • Validate potential module combinations that could be unified in future iterations, such as:
    • D-Fine Decoder and RT-DETR Decoder.
    • D-Fine Hybrid Encoder and RT-DETR Decoder.
    • D-Fine Criterion and RT-DETR Criterion.
  • Validate Post-Training Optimization Tool (POT) results and assess potential accuracy drops.
  • Validate XAI feature.

How to test

  • otx train --config src/otx/recipe/detection/dfine_x.yaml --data_root DATA_ROOT
  • pytest tests/unit/algo/detection/test_dfine.py

Checklist

  • I have added unit tests to cover my changes.​
  • I have added integration tests to cover my changes.​
  • I have ran e2e tests and there is no issues.
  • I have added the description of my changes into CHANGELOG in my target branch (e.g., CHANGELOG in develop).​
  • I have updated the documentation in my target branch accordingly (e.g., documentation in develop).
  • I have linked related issues.

License

  • I submit my code changes under the same Apache License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below).
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

Sorry, something went wrong.

Verified

This commit was signed with the committer’s verified signature.
skosito skosito

Verified

This commit was signed with the committer’s verified signature.
skosito skosito

Verified

This commit was signed with the committer’s verified signature.
skosito skosito
…onfiguration

Verified

This commit was signed with the committer’s verified signature.
skosito skosito

Verified

This commit was signed with the committer’s verified signature.
skosito skosito
…s, and updating documentation
…es, and enhancing documentation for RandomIoUCrop
…tructure and updating type hints in DFINECriterion
@eugene123tw eugene123tw changed the title [Draft] D-Fine PoC D-Fine Detection Algorithm Dec 20, 2024
@eugene123tw eugene123tw marked this pull request as ready for review December 20, 2024 15:32
@eugene123tw eugene123tw changed the title D-Fine Detection Algorithm OTX D-Fine Detection Algorithm Integration Dec 20, 2024
…ng parameter names for consistency
…ion documentation
@github-actions github-actions bot added the DOC Improvements or additions to documentation label Dec 20, 2024
Copy link
Collaborator

@kprokofi kprokofi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Eugene for your great contribution!
I will try D-Fine from your branch with Intel GPUs

return output.permute(0, 2, 1)


class MSDeformableAttentionV2(nn.Module):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use this for RTDetr as well? Maybe it will be upgrade for RTDetrV2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secondly, I would rather put it to otx/src/otx/algo/common/layers/transformer_layers.py as done for RTDetr.

Copy link
Contributor Author

@eugene123tw eugene123tw Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kprokofi yes, we can use it for RTDetrV2. I moved it to otx/src/otx/algo/common/layers/transformer_layers.py


PRETRAINED_ROOT: str = "https://github.com/Peterande/storage/releases/download/dfinev1.0/"

PRETRAINED_WEIGHTS: dict[str, str] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we need all of these variants? We are currently overwhelmed with detection recipes. Could we choose maybe 2 models to expose and omit others? The largest one shows the best performance and it is a candidate for Geti largest template revamp, but other templates seems to be not so beneficial comparing with already introduced models.
So, I would consider cleaning some model versions here (same concerns RTDetr and YOLOX, but it is another story)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest removing the three recipes (D-Fine tiny/small/medium) but keeping their configurations in d_fine.py. This way we can reintroduce those models base on user requests, or if there are future improvements to the pre-trained models. Also, removing the recipes will reduce the load on our CI pipeline.

)


def distance2bbox(points: Tensor, distance: Tensor, reg_scale: Tensor) -> Tensor:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe put this to utils?

Copy link
Contributor Author

@eugene123tw eugene123tw Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved D-Fine utility functions under: src/otx/algo/detection/utils/utils.py

src/otx/algo/detection/heads/dfine_decoder.py Outdated Show resolved Hide resolved
class HybridEncoderModule(nn.Module):
"""HybridEncoder for DFine.

TODO(Eugene): Merge with current rtdetr.HybridEncoderModule in next PR.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -3921,3 +3921,44 @@ def _dispatch_transform(cls, cfg_transform: DictConfig | dict | tvt_v2.Transform
raise TypeError(msg)

return transform


class RandomIoUCrop(tvt_v2.RandomIoUCrop):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use already defined RandomIOUCrop in this file, the performance issues occur?
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used torchvision.RandomIOUCrop to align with the original implementation. I also tested it with mmdet.MinIoURandomCrop and observed no significant differences in accuracy or performance.

I suggest removing mmdet.MinIoURandomCrop and using torchvision.RandomIOUCrop to reduce the code maintenance overhead.

@github-actions github-actions bot added the BUILD label Jan 2, 2025
@eugene123tw eugene123tw requested a review from kprokofi January 2, 2025 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUILD DOC Improvements or additions to documentation TEST Any changes in tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants