Skip to content

Latest commit

ย 

History

History
126 lines (103 loc) ยท 5.81 KB

nvidia_deeplearningexamples_ssd.md

File metadata and controls

126 lines (103 loc) ยท 5.81 KB
layout background-class body-class title summary category image author tags github-link github-id featured_image_1 featured_image_2 accelerator order demo-model-link
hub_detail
hub-background
hub
SSD
Single Shot MultiBox Detector model for object detection
researchers
nvidia_logo.png
NVIDIA
vision
NVIDIA/DeepLearningExamples
ssd_diagram.png
ssd.png
cuda
10

Model Description

SSD300 ๋ชจ๋ธ์€ "๋‹จ์ผ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€์—์„œ ๋ฌผ์ฒด๋ฅผ ๊ฐ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•"์„ ์„ค๋ช…ํ•˜๋Š” SSD: Single Shot MultiBox Detector ๋…ผ๋ฌธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์ž…๋ ฅ ํฌ๊ธฐ๋Š” 300x300์œผ๋กœ ๊ณ ์ •๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ๋ชจ๋ธ๊ณผ ๋…ผ๋ฌธ์— ์„ค๋ช…๋œ ๋ชจ๋ธ์˜ ํฐ ์ฐจ์ด์ ์€ ๋ฐฑ๋ณธ(backbone)์— ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ, ๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉํ•œ VGG ๋ชจ๋ธ์€ ๋” ์ด์ƒ ์‚ฌ์šฉ๋˜์ง€ ์•Š์œผ๋ฉฐ ResNet-50 ๋ชจ๋ธ๋กœ ๋Œ€์ฒด๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Speed/accuracy trade-offs for modern convolutional object detectors ๋…ผ๋ฌธ์—์„œ, ๋ฐฑ๋ณธ์— ๋Œ€ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฐœ์„ ์ด ์ด๋ฃจ์–ด์กŒ์Šต๋‹ˆ๋‹ค.

  • conv5_x, avgpool, fc ๋ฐ softmax ๋ ˆ์ด์–ด๋Š” ๊ธฐ์กด์˜ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์—์„œ ์ œ๊ฑฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
  • conv4_x์˜ ๋ชจ๋“  strides๋Š” 1x1๋กœ ์„ค์ •๋ฉ๋‹ˆ๋‹ค.

๋ฐฑ๋ณธ ๋’ค์—๋Š” 5๊ฐœ์˜ ํ•ฉ์„ฑ๊ณฑ ๋ ˆ์ด์–ด๊ฐ€ ์ถ”๊ฐ€๋ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ํ•ฉ์„ฑ๊ณฑ ๋ ˆ์ด์–ด ์™ธ์—๋„ 6๊ฐœ์˜ detection heads๋ฅผ ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. The backbone is followed by 5 additional convolutional layers. In addition to the convolutional layers, we attached 6 detection heads:

  • ์ฒซ ๋ฒˆ์งธ detection head๋Š” ๋งˆ์ง€๋ง‰ conv4_x ๋ ˆ์ด์–ด์— ์—ฐ๊ฒฐ๋ฉ๋‹ˆ๋‹ค.
  • ๋‚˜๋จธ์ง€ 5๊ฐœ์˜ detection head๋Š” ์ถ”๊ฐ€๋˜๋Š” 5๊ฐœ์˜ ํ•ฉ์„ฑ๊ณฑ ๋ ˆ์ด์–ด์— ๋ถ€์ฐฉ๋ฉ๋‹ˆ๋‹ค.

Detector heads๋Š” ๋…ผ๋ฌธ์—์„œ ์–ธ๊ธ‰๋œ ๊ฒƒ๊ณผ ์œ ์‚ฌํ•˜์ง€๋งŒ, ๊ฐ๊ฐ์˜ ํ•ฉ์„ฑ๊ณฑ ๋ ˆ์ด์–ด ๋’ค์— BatchNorm ๋ ˆ์ด์–ด๋ฅผ ์ถ”๊ฐ€ํ•จ์œผ๋กœ์จ ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋ฉ๋‹ˆ๋‹ค.

Example

์•„๋ž˜ ์˜ˆ์—์„œ๋Š” ์‚ฌ์ „์— ํ•™์Šต๋œ SSD ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ˜ํ”Œ ์ด๋ฏธ์ง€์—์„œ ๊ฐ์ฒด๋ฅผ ํƒ์ง€ํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ์‹œ๊ฐํ™”ํ•ฉ๋‹ˆ๋‹ค.

์˜ˆ์ œ๋ฅผ ์‹คํ–‰ํ•˜๋ ค๋ฉด ๋ช‡ ๊ฐ€์ง€ ์ถ”๊ฐ€์ ์ธ ํŒŒ์ด์ฌ ํŒจํ‚ค์ง€๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ด๋ฏธ์ง€ ์ „์ฒ˜๋ฆฌ ๋ฐ ์‹œ๊ฐํ™”์— ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

pip install numpy scipy scikit-image matplotlib

COCO ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด ์‚ฌ์ „์— ํ•™์Šต๋œ SSD ๋ชจ๋ธ๊ณผ, ๋ชจ๋ธ์˜ ์ž…๋ ฅ ๋ฐ ์ถœ๋ ฅ์— ๋Œ€ํ•œ ํŽธ๋ฆฌํ•˜๊ณ  ํฌ๊ด„์ ์ธ ํ˜•์‹ ์ง€์ •์„ ์œ„ํ•œ ์œ ํ‹ธ๋ฆฌํ‹ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.

import torch
ssd_model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd')
utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd_processing_utils')

์ถ”๋ก ์„ ์œ„ํ•ด ๋ถˆ๋Ÿฌ์˜จ ๋ชจ๋ธ์„ ์ค€๋น„ํ•ฉ๋‹ˆ๋‹ค.

ssd_model.to('cuda')
ssd_model.eval()

๊ฐ์ฒด ํƒ์ง€๋ฅผ ์œ„ํ•œ ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ ์ค€๋น„ํ•ฉ๋‹ˆ๋‹ค. (์•„๋ž˜ ์˜ˆ์ œ ๋งํฌ๋Š” COCO ๋ฐ์ดํ„ฐ์…‹์˜ ์ฒ˜์Œ ๋ช‡ ๊ฐœ์˜ ํ…Œ์ŠคํŠธ ์ด๋ฏธ์ง€์— ํ•ด๋‹นํ•˜์ง€๋งŒ, ๋กœ์ปฌ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ๊ฒฝ๋กœ๋ฅผ ์ง€์ •ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.)

uris = [
    'http://images.cocodataset.org/val2017/000000397133.jpg',
    'http://images.cocodataset.org/val2017/000000037777.jpg',
    'http://images.cocodataset.org/val2017/000000252219.jpg'
]

๋„คํŠธ์›Œํฌ ์ž…๋ ฅ์— ๋งž๊ฒŒ ์ด๋ฏธ์ง€๋ฅผ ํฌ๋งทํ•˜๊ณ  ํ…์„œ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

inputs = [utils.prepare_input(uri) for uri in uris]
tensor = utils.prepare_tensor(inputs)

๊ฐ์ฒด๋ฅผ ํƒ์ง€ํ•˜๊ธฐ ์œ„ํ•ด SSD ๋„คํŠธ์›Œํฌ๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

with torch.no_grad():
    detections_batch = ssd_model(tensor)

SSD ๋„คํŠธ์›Œํฌ์˜ ๊ธฐ๋ณธ ์ถœ๋ ฅ๊ฐ’์€ ๊ฐ์ฒด์˜ ์œ„์น˜๋ฅผ ์‹๋ณ„ํ•˜๋Š” 8732๊ฐœ์˜ box์™€ ํด๋ž˜์Šค ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ๋‹ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณด๋‹ค ์˜๋ฏธ์žˆ๋Š” ๊ฒฐ๊ณผ(์‹ ๋ขฐ๋„>40%)๋งŒ ํ•„ํ„ฐ๋ง ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

results_per_input = utils.decode_results(detections_batch)
best_results_per_input = [utils.pick_best(results, 0.40) for results in results_per_input]

์ด ๋ชจ๋ธ์€ COCO ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด ํ•™์Šต๋˜์—ˆ๊ณ , ํด๋ž˜์Šค ID๋ฅผ (์‚ฌ๋žŒ์ด ์‹๋ณ„ํ•  ์ˆ˜ ์žˆ๋Š”) ๊ฐ์ฒด ์ด๋ฆ„์œผ๋กœ ๋ฐ”๊พธ๊ธฐ ์œ„ํ•ด coco ๋ฐ์ดํ„ฐ์…‹์— ์ ‘๊ทผ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ฒ˜์Œ์— ๋‹ค์šด๋กœ๋“œํ•  ๋•Œ๋Š” ์‹œ๊ฐ„์ด ๊ฑธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

classes_to_labels = utils.get_coco_object_dictionary()

๋์œผ๋กœ, ํƒ์ง€ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์‹œ๊ฐํ™”ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

from matplotlib import pyplot as plt
import matplotlib.patches as patches

for image_idx in range(len(best_results_per_input)):
    fig, ax = plt.subplots(1)
    # Show original, denormalized image...
    image = inputs[image_idx] / 2 + 0.5
    ax.imshow(image)
    # ...with detections
    bboxes, classes, confidences = best_results_per_input[image_idx]
    for idx in range(len(bboxes)):
        left, bot, right, top = bboxes[idx]
        x, y, w, h = [val * 300 for val in [left, bot, right - left, top - bot]]
        rect = patches.Rectangle((x, y), w, h, linewidth=1, edgecolor='r', facecolor='none')
        ax.add_patch(rect)
        ax.text(x, y, "{} {:.0f}%".format(classes_to_labels[classes[idx] - 1], confidences[idx]*100), bbox=dict(facecolor='white', alpha=0.5))
plt.show()

Details

๋ชจ๋ธ ์ž…๋ ฅ ๋ฐ ์ถœ๋ ฅ, ํ•™์Šต ๋ฐฉ๋ฒ•, ์ถ”๋ก  ๋ฐ ์„ฑ๋Šฅ ๋“ฑ์— ๋Œ€ํ•œ ๋” ์ž์„ธํ•œ ์ •๋ณด๋Š” github ๋ฐ NGC์—์„œ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

References