-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add deformable detr repo #89
base: main
Are you sure you want to change the base?
Conversation
long8v
commented
Nov 21, 2022
- code : https://github.com/fundamentalvision/Deformable-DETR.git
- huggingface에 구현이 틀린 부분이 많아 오리지널 레포 다시 읽기
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
새롭네요
# ------------------------------------------------------------------------ | ||
# Deformable DETR | ||
# Copyright (c) 2020 SenseTime. All Rights Reserved. | ||
# Licensed under the Apache License, Version 2.0 [see LICENSE for details] | ||
# ------------------------------------------------------------------------ | ||
# Modified from DETR (https://github.com/facebookresearch/detr) | ||
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved | ||
# ------------------------------------------------------------------------ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main 함수
parser.add_argument('--lr_linear_proj_names', default=['reference_points', 'sampling_offsets'], type=str, nargs='+') | ||
parser.add_argument('--lr_linear_proj_mult', default=0.1, type=float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
detail : projection 하는 부분은 lr * 1/10 해줌
parser.add_argument('--clip_max_norm', default=0.1, type=float, | ||
help='gradient clipping max norm') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gradient clipping이 있었넹? 논문에선 못본 것 같은데
# Variants of Deformable DETR | ||
parser.add_argument('--with_box_refine', default=False, action='store_true') | ||
parser.add_argument('--two_stage', default=False, action='store_true') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bbox refinement / two stage
store_true : 추가 옵션을 받지 않고 단지 옵션의 유/무만 필요한 경우 action="store_true"를 사용합니다.
parser.add_argument('--dilation', action='store_true', | ||
help="If true, we replace stride with dilation in the last convolutional block (DC5)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DC5 option 킬지 끌지
class DeformableTransformerDecoder(nn.Module): | ||
def __init__(self, decoder_layer, num_layers, return_intermediate=False): | ||
super().__init__() | ||
self.layers = _get_clones(decoder_layer, num_layers) | ||
self.num_layers = num_layers | ||
self.return_intermediate = return_intermediate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
디코더를 보자
# hack implementation for iterative bounding box refinement and two-stage Deformable DETR | ||
self.bbox_embed = None | ||
self.class_embed = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
왜 Hack이라는지 알 것 같기도.. bbox_embed, class_embed는 밖에서 정의된건데 그걸 가지고 와서 안에서 처리하는 식으로 되어있어서
def forward(self, tgt, reference_points, src, src_spatial_shapes, src_level_start_index, src_valid_ratios, | ||
query_pos=None, src_padding_mask=None): | ||
output = tgt | ||
|
||
intermediate = [] | ||
intermediate_reference_points = [] | ||
for lid, layer in enumerate(self.layers): | ||
if reference_points.shape[-1] == 4: | ||
reference_points_input = reference_points[:, :, None] \ | ||
* torch.cat([src_valid_ratios, src_valid_ratios], -1)[:, None] | ||
else: | ||
assert reference_points.shape[-1] == 2 | ||
reference_points_input = reference_points[:, :, None] * src_valid_ratios[:, None] | ||
output = layer(output, query_pos, reference_points_input, src, src_spatial_shapes, src_level_start_index, src_padding_mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reference point들 받고 마스킹같은거 처리하고 DecoderLayer에 통과
# hack implementation for iterative bounding box refinement | ||
if self.bbox_embed is not None: | ||
tmp = self.bbox_embed[lid](output) | ||
if reference_points.shape[-1] == 4: | ||
new_reference_points = tmp + inverse_sigmoid(reference_points) | ||
new_reference_points = new_reference_points.sigmoid() | ||
else: | ||
assert reference_points.shape[-1] == 2 | ||
new_reference_points = tmp | ||
new_reference_points[..., :2] = tmp[..., :2] + inverse_sigmoid(reference_points) | ||
new_reference_points = new_reference_points.sigmoid() | ||
reference_points = new_reference_points.detach() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bbox_embed가 주어지면 기존 DecoderLayer 통과한 Output을 가지고 bounding box를 예측하고 이걸 기반으로 reference point를 조금 수정함
if self.return_intermediate: | ||
return torch.stack(intermediate), torch.stack(intermediate_reference_points) | ||
|
||
return output, reference_points |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DecoderLayer output과 reference points Return. two-stage, refinement 없으면 reference point는 첫 레이어나 마지막 레이어나 바뀌지 않음