Releases: snowphone/CS492-Sys_for_ML
Releases · snowphone/CS492-Sys_for_ML
Implement Tiny YOLOv2 in C/CUDA/BLAS
HW2: Implement with numpy
Implement with numpy (#2) * Add skeleton codes for hw2 * Apply revised skeleton code * Fill out LeakyReLU * Fill out BiasAdd * Change input type: ndarray -> DnnNode I misread the code so I changed the input type to DnnNode Also, Conv2d and MaxPool2d: WIP * Change exception type to customed `DNNException` * Refactor biasAdd class In the previous commit, I forgot biases are shared among elements in a channel. Thus I editted test cases for the revised purpose. Actually, the result is still same as previous commit, but I added some comments to prove the code. * Fill out padding auxiliary function * Fill out sequential version of _stride But I think there might be a better idea using flatten and dot products * Fill out conv using im2col method, but yet tested * Unify expression: n_filter -> ksize * Fix bug on _stride method Now _stride works properly when the matrix is divided by ksize and stride (e.g. 3 x 3 matrix, ksize: 2, stride: 1). * Run maxpool on a simple condition If the matrix is evenly divisible by ksize, then maxpool works properly * Pass test_stride3 * Pass test_stride_all_not_even * Rename test function names for better understanding * Pass test_conv * Pass pad tests with batch support * Pass every test with batch-support * Use swapaxes instead of moveaxis for saving memory * Fix a bug on reshaping after conv * Edit codes for current working directory * Read weights from the current folder * Neutralize python3 version check * Fix test_maxpool_valid * Fix a bug and pass test_maxpool_same In previous commit, I set column wise index bound to `row` while striding the tensor. * Inference objects from an image correctly. I tested on the cluster and it detected objects well. Currently, the code is not clean since there's still debugging codes. Furthermore, stride method is not yet parallelized and the code does not provide any exception handling. * Refactor codes * Add current status * Add revised requirements * Update README.md * Update layer's result in init For conv and pool layer, consider strides and padding. * Refactor and clean up the working directory * Update README.md * Update dnn.py * Update dnn.py * Update dnn.py * Update dnn.py Co-authored-by: jehoon315 <[email protected]>
Fix typos and add team number
v1.0.1 Add team number 11
HW1: Implementing Tiny YOLOv2 using tensorflow API
Hw1 (#1) * Fill out blanked codes I didn't test yet, but I wrote most of logical flows that the homework requires. I wrote draw function and main logic. However, I didn't understand 4th requirements. * Change indentation unit to tab In previous commit, two spaces and four spaces are mixed for indentation. So, I changed all to tab. * Create layers, not yet weights I created YOLO-v2-tiny model, but pretrained model is delivered in pickle format, so I haven't understood how to use it. But soon I'll load it * Refactor duplicated codes into a function * Fix bugs in opening video file and resizing function * Set batch to 1 because it only proceeds inference * (Incomplete) Consume Too much(more than 100GB) memory * Fix a typo * Fix memory allocation error I tried to make filters from 16 until 1024, but I accidently did from 16 to pow(16, 9) ~ 6e10 ~ 60G. That was why my code cannot allocate enough memory. Plus, I fixed professor's wrong code. sess.run evaluates only last layer. * Set fixed precision on throughput * Add some information about yolov2tiny architecture * (Incomplete) Find a bottleneck on non-max-suppression To find out, I made a tracer and set on some functions * Fix wrongly indented codes in nms function * Reshape bounding boxes from (416, 416) to original video resolution * Add parameters * Show attributes * Store every layer on first frame into intermediate folder * WIP: Make a room for bias, but not yet implemented cuz still figuring out to use weights * Change video codec to mp4v * WIP: create layers hierarchically YOLO-v2-tiny consists of nine composit layers. And each layer consists of smaller layers such as conv, batch_norm, bias, maxpool, leakyReLU. Therefore, I mimicked its hierarchy. * Infer objects correctly * Add explicit bias layers right after each conv layer. * Load weights by using tf.Variable and attentive layers in tf.nn. * Use leftupper coordinate and rightbottom coordinate in draw function since coordinates are already shaped in restore_shape function. * Remove unused comments and debug lines * Use original image in draw function. * Measure inference, end-to-end, fps and total time * Update for LaTeX * Add report template * Limit yolo takes only 70% of GPU VRAM In small VRAM environment, allow_grouth option is not enough for preventing out of memory error. So, referencing some information, I forced not to take more than 70% VRAM * Clarify which values we save * Update yolov2tiny.py delete out_chan, default value of stride 통일성 위해서 maxpool도 그냥 max_pool2d로 하면 어떤지? (일단 주석해놈) * Update yolov2tiny.py * Update __init__.py Add start time of "end-to-end time", beg_start and change previous beg -> beg_infer Does change to name of "beg" in obj_detection effect to measure function? (I'm not sure) * Update yolov2tiny.py put back n_... values to post processing * Create consider.txt * Update yolov2tiny.py confirm "tf.nn.max_pool2d" working well * Update consider.txt * Update __init__.py move down the saving intermediate first frame result(tensor) part for measuring necessary time * Update consider.txt * Update __init__.py add printing total time * Add some details * Add comment about inference FPS * Write detailed info of why I chose tf.nn functions * Upload whole model visualization I visualized the whole tf graph by using tf.train.Saver. The only catch is, it is too verbose to see the main logic. But I decided to save the visualized graph just in case. * Add GPU benchmark * Add cpu benchmark * Add first draft of the report * Update report.tex Some change * Update report.tex * Change code location and table * Edit figures Co-authored-by: jehoon315 <[email protected]>