layout | background-class | body-class | category | title | summary | image | author | tags | github-link | github-id | featured_image_1 | featured_image_2 | accelerator | demo-model-link | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
hub_detail |
hub-background |
hub |
researchers |
MiDaS |
MiDaS models for computing relative depth from a single image. |
intel-logo.png |
Intel ISL |
|
intel-isl/MiDaS |
midas_samples.png |
no-image |
cuda-optional |
MiDaS๋ ๋จ์ผ ์ด๋ฏธ์ง๋ก๋ถํฐ ์๋์ ์ญ ๊น์ด(relative inverse depth)๋ฅผ ๊ณ์ฐํฉ๋๋ค. ๋ณธ ์ ์ฅ์๋ ์์ง๋ง ๊ณ ์์ ๋ชจ๋ธ๋ถํฐ ๊ฐ์ฅ ๋์ ์ ํ๋๋ฅผ ์ ๊ณตํ๋ ๋งค์ฐ ํฐ ๋ชจ๋ธ๊น์ง ๋ค์ํ ์ฌ๋ก๋ฅผ ๋ค๋ฃจ๋ ์ฌ๋ฌ ๋ชจ๋ธ์ ์ ๊ณตํฉ๋๋ค. ๋ํ ๋ชจ๋ธ์ ๊ด๋ฒ์ํ ์ ๋ ฅ์์ ๋์ ํ์ง์ ๋ณด์ฅํ๊ธฐ ์ํด ๋ค๋ชฉ์ (multi-objective) ์ต์ ํ๋ฅผ ์ฌ์ฉํด 10๊ฐ์ ๊ฐ๋ณ ๋ฐ์ดํฐ ์ ์ ๋ํด ํ๋ จ๋์์ต๋๋ค.
MiDas ๋ชจ๋ธ์ timm์ ์ฌ์ฉํฉ๋๋ค ์๋ ๋ช ๋ น์ด๋ฅผ ํตํด ์ค์นํด ์ฃผ์ธ์.
pip install timm
ํ์ดํ ์น ํํ์ด์ง๋ก๋ถํฐ ์ด๋ฏธ์ง๋ฅผ ๋ค์ด๋ก๋ํฉ๋๋ค.
import cv2
import torch
import urllib.request
import matplotlib.pyplot as plt
url, filename = ("https://github.com/pytorch/hub/raw/master/images/dog.jpg", "dog.jpg")
urllib.request.urlretrieve(url, filename)
๋ชจ๋ธ์ ๋ก๋ํฉ๋๋ค. (๊ฐ์๋ https://github.com/intel-isl/MiDaS/#Accuracy ๋ฅผ ์ฐธ์กฐํ์ธ์.)
model_type = "DPT_Large" # MiDaS v3 - Large (highest accuracy, slowest inference speed)
#model_type = "DPT_Hybrid" # MiDaS v3 - Hybrid (medium accuracy, medium inference speed)
#model_type = "MiDaS_small" # MiDaS v2.1 - Small (lowest accuracy, highest inference speed)
midas = torch.hub.load("intel-isl/MiDaS", model_type)
GPU ์ฌ์ฉ์ด ๊ฐ๋ฅํ ํ๊ฒฝ์ด๋ผ๋ฉด, ๋ชจ๋ธ์ GPU๋ฅผ ์ฌ์ฉํฉ๋๋ค.
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
midas.to(device)
midas.eval()
์ฌ๋ฌ๊ฐ์ง ๋ชจ๋ธ์ ์ ๋ ฅํ ์ด๋ฏธ์ง๋ฅผ ํฌ๊ธฐ ๋ณ๊ฒฝ(resize)์ด๋ ์ ๊ทํ(normalize)ํ๊ธฐ ์ํ ๋ณํ(transform)์ ๋ถ๋ฌ์ต๋๋ค.
midas_transforms = torch.hub.load("intel-isl/MiDaS", "transforms")
if model_type == "DPT_Large" or model_type == "DPT_Hybrid":
transform = midas_transforms.dpt_transform
else:
transform = midas_transforms.small_transform
์ด๋ฏธ์ง๋ฅผ ๋ก๋ํ๊ณ ๋ณํ(transforms)์ ์ ์ฉํฉ๋๋ค.
img = cv2.imread(filename)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
input_batch = transform(img).to(device)
๊ธฐ์กด ํด์๋๋ก ์์ธก ๋ฐ ํฌ๊ธฐ ๋ณ๊ฒฝํฉ๋๋ค.
with torch.no_grad():
prediction = midas(input_batch)
prediction = torch.nn.functional.interpolate(
prediction.unsqueeze(1),
size=img.shape[:2],
mode="bicubic",
align_corners=False,
).squeeze()
output = prediction.cpu().numpy()
๊ฒฐ๊ณผ ์ถ๋ ฅ
plt.imshow(output)
# plt.show()
Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
Vision Transformers for Dense Prediction
๋ง์ฝ MiDaS ๋ชจ๋ธ์ ์ฌ์ฉํ๋ค๋ฉด ๋ณธ ๋ ผ๋ฌธ์ ์ธ์ฉํด ์ฃผ์ธ์:
@article{Ranftl2020,
author = {Ren\'{e} Ranftl and Katrin Lasinger and David Hafner and Konrad Schindler and Vladlen Koltun},
title = {Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
year = {2020},
}
@article{Ranftl2021,
author = {Ren\'{e} Ranftl and Alexey Bochkovskiy and Vladlen Koltun},
title = {Vision Transformers for Dense Prediction},
journal = {ArXiv preprint},
year = {2021},
}