-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Fill out blanked codes I didn't test yet, but I wrote most of logical flows that the homework requires. I wrote draw function and main logic. However, I didn't understand 4th requirements. * Change indentation unit to tab In previous commit, two spaces and four spaces are mixed for indentation. So, I changed all to tab. * Create layers, not yet weights I created YOLO-v2-tiny model, but pretrained model is delivered in pickle format, so I haven't understood how to use it. But soon I'll load it * Refactor duplicated codes into a function * Fix bugs in opening video file and resizing function * Set batch to 1 because it only proceeds inference * (Incomplete) Consume Too much(more than 100GB) memory * Fix a typo * Fix memory allocation error I tried to make filters from 16 until 1024, but I accidently did from 16 to pow(16, 9) ~ 6e10 ~ 60G. That was why my code cannot allocate enough memory. Plus, I fixed professor's wrong code. sess.run evaluates only last layer. * Set fixed precision on throughput * Add some information about yolov2tiny architecture * (Incomplete) Find a bottleneck on non-max-suppression To find out, I made a tracer and set on some functions * Fix wrongly indented codes in nms function * Reshape bounding boxes from (416, 416) to original video resolution * Add parameters * Show attributes * Store every layer on first frame into intermediate folder * WIP: Make a room for bias, but not yet implemented cuz still figuring out to use weights * Change video codec to mp4v * WIP: create layers hierarchically YOLO-v2-tiny consists of nine composit layers. And each layer consists of smaller layers such as conv, batch_norm, bias, maxpool, leakyReLU. Therefore, I mimicked its hierarchy. * Infer objects correctly * Add explicit bias layers right after each conv layer. * Load weights by using tf.Variable and attentive layers in tf.nn. * Use leftupper coordinate and rightbottom coordinate in draw function since coordinates are already shaped in restore_shape function. * Remove unused comments and debug lines * Use original image in draw function. * Measure inference, end-to-end, fps and total time * Update for LaTeX * Add report template * Limit yolo takes only 70% of GPU VRAM In small VRAM environment, allow_grouth option is not enough for preventing out of memory error. So, referencing some information, I forced not to take more than 70% VRAM * Clarify which values we save * Update yolov2tiny.py delete out_chan, default value of stride 통일성 위해서 maxpool도 그냥 max_pool2d로 하면 어떤지? (일단 주석해놈) * Update yolov2tiny.py * Update __init__.py Add start time of "end-to-end time", beg_start and change previous beg -> beg_infer Does change to name of "beg" in obj_detection effect to measure function? (I'm not sure) * Update yolov2tiny.py put back n_... values to post processing * Create consider.txt * Update yolov2tiny.py confirm "tf.nn.max_pool2d" working well * Update consider.txt * Update __init__.py move down the saving intermediate first frame result(tensor) part for measuring necessary time * Update consider.txt * Update __init__.py add printing total time * Add some details * Add comment about inference FPS * Write detailed info of why I chose tf.nn functions * Upload whole model visualization I visualized the whole tf graph by using tf.train.Saver. The only catch is, it is too verbose to see the main logic. But I decided to save the visualized graph just in case. * Add GPU benchmark * Add cpu benchmark * Add first draft of the report * Update report.tex Some change * Update report.tex * Change code location and table * Edit figures Co-authored-by: jehoon315 <[email protected]>
- Loading branch information
Showing
15 changed files
with
8,679 additions
and
250 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -127,3 +127,9 @@ dmypy.json | |
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
|
||
# LaTeX | ||
**/*.aux | ||
**/*-eps-converted-to.pdf | ||
**/*.log |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,84 +1,171 @@ | ||
import os | ||
import sys | ||
import numpy as np | ||
import cv2 as cv2 | ||
import time | ||
import yolov2tiny | ||
|
||
def open_video_with_opencv(in_video_path, out_video_path): | ||
# | ||
# This function takes input and output video path and open them. | ||
# | ||
# Your code from here. You may clear the comments. | ||
# | ||
print('open_video_with_opencv is not yet implemented') | ||
sys.exit() | ||
|
||
# Open an object of input video using cv2.VideoCapture. | ||
|
||
|
||
# Open an object of output video using cv2.VideoWriter. | ||
|
||
|
||
# Return the video objects and anything you want for further process. | ||
|
||
|
||
def resize_input(im): | ||
imsz = cv2.resize(im, (416, 416)) | ||
imsz = imsz / 255. | ||
imsz = imsz[:,:,::-1] | ||
return np.asarray(imsz, dtype=np.float32) | ||
|
||
def video_object_detection(in_video_path, out_video_path, proc="cpu"): | ||
# | ||
# This function runs the inference for each frame and creates the output video. | ||
# | ||
# Your code from here. You may clear the comments. | ||
# | ||
print('video_object_detection is not yet implemented') | ||
sys.exit() | ||
|
||
# Open video using open_video_with_opencv. | ||
|
||
|
||
# Check if video is opened. Otherwise, exit. | ||
from datetime import datetime | ||
from functools import reduce, wraps | ||
from typing import List, Tuple | ||
|
||
import cv2 | ||
import numpy as np | ||
|
||
# Create an instance of the YOLO_V2_TINY class. Pass the dimension of | ||
# the input, a path to weight file, and which device you will use as arguments. | ||
import yolov2tiny | ||
|
||
|
||
# Start the main loop. For each frame of the video, the loop must do the followings: | ||
# 1. Do the inference. | ||
# 2. Run postprocessing using the inference result, accumulate them through the video writer object. | ||
# The coordinates from postprocessing are calculated according to resized input; you must adjust | ||
# them to fit into the original video. | ||
# 3. Measure the end-to-end time and the time spent only for inferencing. | ||
# 4. Save the intermediate values for the first layer. | ||
# Note that your input must be adjusted to fit into the algorithm, | ||
# including resizing the frame and changing the dimension. | ||
def measure(func): | ||
""" Measure how long a function takes time """ | ||
@wraps(func) | ||
def impl(*args, **kargs): | ||
beg=datetime.now() | ||
ret = func(*args, **kargs) | ||
time = (datetime.now() - beg).total_seconds() | ||
print("{}: {}s".format(func.__name__, time)) | ||
return ret | ||
|
||
return impl | ||
|
||
|
||
def open_video_with_opencv( | ||
in_video_path: str, | ||
out_video_path: str) -> (cv2.VideoCapture, cv2.VideoWriter): | ||
|
||
reader = cv2.VideoCapture(in_video_path) | ||
if not reader.isOpened(): | ||
raise Exception("Failed to open \'{}\'".format(in_video_path)) | ||
|
||
# Check the inference peformance; end-to-end elapsed time and inferencing time. | ||
# Check how many frames are processed per second respectivly. | ||
|
||
fps = reader.get(cv2.CAP_PROP_FPS) | ||
fourcc = cv2.VideoWriter_fourcc(*"mp4v") | ||
width = int(reader.get(cv2.CAP_PROP_FRAME_WIDTH)) | ||
height = int(reader.get(cv2.CAP_PROP_FRAME_HEIGHT)) | ||
|
||
writer = cv2.VideoWriter(out_video_path, fourcc, fps, (width, height)) | ||
if not writer.isOpened(): | ||
raise Exception( | ||
"Failed to create video named \'{}\'".format(out_video_path)) | ||
|
||
return reader, writer | ||
|
||
|
||
def resize_input(im: np.ndarray) -> np.ndarray: | ||
imsz = cv2.resize(im, (416, 416), interpolation=cv2.INTER_AREA) | ||
imsz = imsz / 255. | ||
imsz = imsz[:, :, ::-1] | ||
imsz = np.asarray(imsz, dtype=np.float32) | ||
return imsz.reshape((1, *imsz.shape)) | ||
|
||
|
||
color_t = Tuple[float, float, float] | ||
coord_t = Tuple[int, int] | ||
proposal_t = Tuple[str, coord_t, coord_t, color_t] | ||
|
||
|
||
def restore_shape(proposals: List[proposal_t], restore_width: int, | ||
restore_height: int) -> List[proposal_t]: | ||
""" | ||
Read proposal list and reshape proposal coordinates into original video's resolution | ||
""" | ||
def reshape(record: proposal_t) -> proposal_t: | ||
""" | ||
Get a record and reshape coordinates into original ratio. | ||
cf) lu means left upper and rb means right bottom. | ||
""" | ||
calc_coord = lambda x, new_d: np.clip(int(x / 416 * new_d), 0, new_d) | ||
name, (lux, luy), (rbx, rby), color = record | ||
lux, rbx = map(lambda x: calc_coord(x, restore_width), [lux, rbx]) | ||
luy, rby = map(lambda y: calc_coord(y, restore_height), [luy, rby]) | ||
return (name, (lux, luy), (rbx, rby), color) | ||
|
||
return [reshape(it) for it in proposals] | ||
|
||
|
||
def draw(image: np.ndarray, proposals: List[proposal_t]) -> np.ndarray: | ||
''' | ||
Draw bounding boxes into image and return it | ||
proposals contains a list of (best_class_name, lefttop, rightbottom, color). | ||
''' | ||
for name, lefttop, rightbottom, color in proposals: | ||
height, width, _channel = image.shape | ||
|
||
cv2.rectangle(image, lefttop, rightbottom, color, 2) | ||
cv2.putText(image, name, (lefttop[0], max(0, lefttop[1] - 10)), | ||
cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2) | ||
|
||
return image | ||
|
||
|
||
def store_tensors(tensors: List[np.ndarray]): | ||
os.makedirs("intermediate", exist_ok=True) | ||
for i, tensor in enumerate(tensors): | ||
path = os.path.join("intermediate", "layer_{}.npy".format(i)) | ||
np.save(path, tensor) | ||
|
||
|
||
@measure | ||
def video_object_detection(in_video_path: str, | ||
out_video_path: str, | ||
proc="cpu"): | ||
""" | ||
Read a videofile, scan each frame and draw objects using pretrained yolo_v2_tiny model. | ||
Finally, store drawed frames into 'out_video_path' | ||
""" | ||
reader, writer = open_video_with_opencv(in_video_path, out_video_path) | ||
yolo = yolov2tiny.YOLO_V2_TINY((416, 416, 3), "./y2t_weights.pickle", proc) | ||
|
||
width = int(reader.get(cv2.CAP_PROP_FRAME_WIDTH)) | ||
height = int(reader.get(cv2.CAP_PROP_FRAME_HEIGHT)) | ||
|
||
acc, firstTime = [], True | ||
while reader.isOpened(): | ||
okay, original_image = reader.read() | ||
if not okay: | ||
break | ||
beg_start = datetime.now() | ||
image = resize_input(original_image) | ||
beg_infer = datetime.now() | ||
batched_tensors_list = yolo.inference(image) | ||
inference_time = (datetime.now() - beg_infer).total_seconds() | ||
|
||
tensor = batched_tensors_list[-1][0] | ||
|
||
proposals = yolov2tiny.postprocessing(tensor) | ||
proposals = restore_shape(proposals, width, height) | ||
out_image = draw(original_image, proposals) | ||
writer.write(out_image) | ||
|
||
end_to_end_time = (datetime.now() - beg_start).total_seconds() | ||
acc.append((inference_time, end_to_end_time)) | ||
print("#{} inference: {:.3f}\tend-to-end: {:.3f}".format(len(acc), inference_time, end_to_end_time)) | ||
|
||
if firstTime: | ||
store_tensors(map(lambda x: x[0], batched_tensors_list)) # Remove batch shape | ||
firstTime = False | ||
|
||
reader.release() | ||
writer.release() | ||
inference_sum, end_to_end_sum = reduce(lambda x,y: (x[0] + y[0], x[1] + y[1]), acc) | ||
size = len(acc) | ||
print("Total inference: {:.3f}s\ttotal end-to-end: {:.3f}s".format(inference_sum, end_to_end_sum)) | ||
print("Average inference: {:.3f}s\taverage end-to-end: {:.3f}s".format(inference_sum/size, end_to_end_sum/size)) | ||
print("Throughput: {:.3f}fps".format(size / end_to_end_sum)) | ||
return | ||
|
||
# Release the opened videos. | ||
|
||
|
||
def main(): | ||
if len(sys.argv) < 3: | ||
print ("Usage: python3 __init__.py [in_video.mp4] [out_video.mp4] ([cpu|gpu])") | ||
sys.exit() | ||
if len(sys.argv) < 3: | ||
print( | ||
"Usage: python3 __init__.py [in_video.mp4] [out_video.mp4] ([cpu|gpu])" | ||
) | ||
sys.exit() | ||
|
||
in_video_path = sys.argv[1] | ||
out_video_path = sys.argv[2] | ||
|
||
in_video_path = sys.argv[1] | ||
out_video_path = sys.argv[2] | ||
if len(sys.argv) == 4: | ||
proc = sys.argv[3] | ||
else: | ||
proc = "cpu" | ||
|
||
if len(sys.argv) == 4: | ||
proc = sys.argv[3] | ||
else: | ||
proc = "cpu" | ||
video_object_detection(in_video_path, out_video_path, proc) | ||
|
||
video_object_detection(in_video_path, out_video_path, proc) | ||
|
||
if __name__ == "__main__": | ||
main() | ||
main() |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
Type: <class 'list'> | ||
Length: 9 | ||
Element type: <class 'collections.OrderedDict'> | ||
conv0 | ||
conv0[kernel]: (3, 3, 3, 16) | ||
conv0[biases]: (16,) | ||
conv0[moving_variance]: (16,) | ||
conv0[gamma]: (16,) | ||
conv0[moving_mean]: (16,) | ||
conv1 | ||
conv1[kernel]: (3, 3, 16, 32) | ||
conv1[biases]: (32,) | ||
conv1[moving_variance]: (32,) | ||
conv1[gamma]: (32,) | ||
conv1[moving_mean]: (32,) | ||
conv2 | ||
conv2[kernel]: (3, 3, 32, 64) | ||
conv2[biases]: (64,) | ||
conv2[moving_variance]: (64,) | ||
conv2[gamma]: (64,) | ||
conv2[moving_mean]: (64,) | ||
conv3 | ||
conv3[kernel]: (3, 3, 64, 128) | ||
conv3[biases]: (128,) | ||
conv3[moving_variance]: (128,) | ||
conv3[gamma]: (128,) | ||
conv3[moving_mean]: (128,) | ||
conv4 | ||
conv4[kernel]: (3, 3, 128, 256) | ||
conv4[biases]: (256,) | ||
conv4[moving_variance]: (256,) | ||
conv4[gamma]: (256,) | ||
conv4[moving_mean]: (256,) | ||
conv5 | ||
conv5[kernel]: (3, 3, 256, 512) | ||
conv5[biases]: (512,) | ||
conv5[moving_variance]: (512,) | ||
conv5[gamma]: (512,) | ||
conv5[moving_mean]: (512,) | ||
conv6 | ||
conv6[kernel]: (3, 3, 512, 1024) | ||
conv6[biases]: (1024,) | ||
conv6[moving_variance]: (1024,) | ||
conv6[gamma]: (1024,) | ||
conv6[moving_mean]: (1024,) | ||
conv7 | ||
conv7[kernel]: (3, 3, 1024, 1024) | ||
conv7[biases]: (1024,) | ||
conv7[moving_variance]: (1024,) | ||
conv7[gamma]: (1024,) | ||
conv7[moving_mean]: (1024,) | ||
conv8 | ||
conv8[kernel]: (1, 1, 1024, 125) | ||
conv8[biases]: (125,) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
Write a report of one or two pages long. Your report must include | ||
|
||
1. how you implemented (뭔가 중요한 포인트가 더 있다면 적어주시면 됩니다) | ||
|
||
1) open video with openCV | ||
|
||
R/W Video openCV 이용, out video attributes는 reader.get 메소드를 이용하여 최대한 원본을 유지하였으나, 비디오 코덱의 경우 OS마다 기본적으로 지원 가능한 코덱이 다르므로, fourcc는 mp4v를 사용함. | ||
|
||
2) yolov2tiny tensor graph building | ||
|
||
gpu memory 0.7 -> 메모리 터져서 넣어줌. allow_gpu_growth 옵션만으로는 부족하여 추가했다. | ||
tensorflow를 이용해서, Yolov2tiny의 40layer를 구현하였다. | ||
이번 과제는 주어진 weight ndarry 값을 사용하여 weight parameter를 직접 설정하고 inference를 하는것이 목적이므로, 편의성이 강조된 대신 weight값을 변경할 수 없는 tf.contrib 모듈 대신 weight parameter를 manual하게 부여해줄 수 있는 tf.nn 모듈 내의 함수를 사용하였다. | ||
각 레이어를 생성하는 과정에서 각 레이어에 알맞은 weight 값을 적용하였다. 제공된 pickle 파일에서 kernel은 conv_2d 레이어를, biases는 bias_add 레이어를, (moving_variance, gamma, moving_mean)은 batch_normalization 레이어를 초기화 하는데 사용되었다. | ||
|
||
3) obj detection | ||
|
||
매 loop마다 input frame을 yolov2tiny에 맞게 resize해주고 이를 input으로 inference 한다. Inference된 텐서를 postprocessing 함수에 넣어 bounding box를 추출한다. 이 과정에서 기준값 이하의 confidence 값을 가진 bounding box들은 소거되고, 남은 box들도 non-max-suppression 을 통해 각 object마다 confidence값이 가장 높은 하나의 bounding box만 남겨놓는다. | ||
box를 원래의 사이즈로 reszie하고 input frame과 합쳐서 output frame을 저장한다. | ||
|
||
2. execution time and how many FPS processed (end-to-end, only for inference) | ||
|
||
#1~2에서 시간이 많이 걸리는데 이에대한 이유 서술하면 좋을듯. | ||
why? Due to initializing tensor? 혹은 캐싱?ㅠ | ||
|
||
첫 프레임 텐서 저장하는 거 때문에 그런가 했는데 옮겨도 똑같았음.(약간의 차이는 있긴 했지만 메인이 이거 때문이 아님) | ||
|
||
#1 #2 #3 ... | ||
CPU : 0.157 0.083 0.078 ... | ||
GPU : 1.352 0.104 0.011 ... | ||
|
||
total / Inference(frame) / end-to-end(frame) / FPS | ||
CPU : 43.591 / 0.058 / 0.096 / 10.392 | ||
GPU : 24.778 / 0.016 / 0.055 / 18.282 | ||
|
||
Total : 마지막으로 출력되는 값인 줄 알았는데 그거 함수 시간 재는거 였음 여기 total은 end-to-end_sum임 (total/453 = Avg.end-to-end) | ||
|
||
위에 값들 몇 번 돌려서 평균값 해야될 것 같음 지금은 그냥 1번 해서 나온 값임 | ||
|
||
Inference FPS도 필요한지 잘 모르겠네요. lecture #5에 FPS measurement exclusively for DNN computation 라는 말이 있었음. | ||
이후에 FPS improve 하려고하면 resizing part는 어차피 동일 할테니 inference FPS가 더 명확한 수치이기도 한 것 같고.. | ||
|
||
->결과 분석시 inference FPS간 비교를 해서 inference는 GPU 가속의 힘을 봤지만, postprocessing은 CPU만을 이용하여 sequential하게 구현되었기 때문에 gpu 모드로 동작할 때 병목으로 작용하였다 같은 식으로 서술해보아도 좋지 않을까요? | ||
|
||
3. comparison the execution time from CPU and GPU and analyze it | ||
|
||
inference 빼면 시간 비슷하지 않을까 싶음 -> 맞음 비슷함 | ||
end-to-end - inference | ||
CPU : 0.038 | ||
GPU : 0.039 | ||
|
||
GPU using improvement over CPU(문법이 맞나?) | ||
Inference : 3.625x | ||
Total : 1.760x | ||
|
||
|
||
The purpose of the report is to show your understanding. Please write the answer short and clear. | ||
|
||
Video frame size : 540 x 540 | ||
Video fps = 30 | ||
Video length = 15s | ||
Video frame number : 453 |
Oops, something went wrong.