Skip to content

Releases: snowphone/CS492-Sys_for_ML

Implement Tiny YOLOv2 in C/CUDA/BLAS

02 Jun 11:10
Compare
Choose a tag to compare
v3.0

Update to reflect current status

HW2: Implement with numpy

12 May 11:39
a30c5eb
Compare
Choose a tag to compare
Implement with numpy (#2)

* Add skeleton codes for hw2

* Apply revised skeleton code

* Fill out LeakyReLU

* Fill out BiasAdd

* Change input type: ndarray -> DnnNode

I misread the code so I changed the input type to DnnNode
Also, Conv2d and MaxPool2d: WIP

* Change exception type to customed `DNNException`

* Refactor biasAdd class

In the previous commit, I forgot biases are shared among elements in a
channel. Thus I editted test cases for the revised purpose. Actually,
    the result is still same as previous commit, but I added some
    comments to prove the code.

* Fill out padding auxiliary function

* Fill out sequential version of _stride

But I think there might be a better idea using flatten and dot products

* Fill out conv using im2col method, but yet tested

* Unify expression: n_filter -> ksize

* Fix bug on _stride method

Now _stride works properly when the matrix is divided by ksize and
stride (e.g. 3 x 3 matrix, ksize: 2, stride: 1).

* Run maxpool on a simple condition

If the matrix is evenly divisible by ksize, then maxpool works properly

* Pass test_stride3

* Pass test_stride_all_not_even

* Rename test function names for better understanding

* Pass test_conv

* Pass pad tests with batch support

* Pass every test with batch-support

* Use swapaxes instead of moveaxis for saving memory

* Fix a bug on reshaping after conv

* Edit codes for current working directory

* Read weights from the current folder
* Neutralize python3 version check

* Fix test_maxpool_valid

* Fix a bug and pass test_maxpool_same

In previous commit, I set column wise index bound to `row` while
striding the tensor.

* Inference objects from an image correctly.

I tested on the cluster and it detected objects well.
Currently, the code is not clean since there's still debugging codes.
Furthermore, stride method is not yet parallelized and the code does not
provide any exception handling.

* Refactor codes

* Add current status

* Add revised requirements

* Update README.md

* Update layer's result in init

For conv and pool layer, consider strides and padding.

* Refactor and clean up the working directory

* Update README.md

* Update dnn.py

* Update dnn.py

* Update dnn.py

* Update dnn.py

Co-authored-by: jehoon315 <[email protected]>

Fix typos and add team number

21 Apr 11:10
Compare
Choose a tag to compare
v1.0.1

Add team number 11

HW1: Implementing Tiny YOLOv2 using tensorflow API

20 Apr 08:04
94de1fc
Compare
Choose a tag to compare
Hw1 (#1)

* Fill out blanked codes

I didn't test yet, but I wrote most of logical flows that the homework requires.
I wrote draw function and main logic. However, I didn't understand 4th requirements.

* Change indentation unit to tab

In previous commit, two spaces and four spaces are mixed for indentation.
So, I changed all to tab.

* Create layers, not yet weights

I created YOLO-v2-tiny model, but pretrained model is delivered in pickle format, so I haven't understood how to use it. But soon I'll load it

* Refactor duplicated codes into a function

* Fix bugs in opening video file and resizing function

* Set batch to 1 because it only proceeds inference

* (Incomplete) Consume Too much(more than 100GB) memory

* Fix a typo

* Fix memory allocation error

I tried to make filters from 16 until 1024, but I accidently did from 16 to pow(16, 9) ~ 6e10 ~ 60G.
That was why my code cannot allocate enough memory.

Plus, I fixed professor's wrong code. sess.run evaluates only last layer.

* Set fixed precision on throughput

* Add some information about yolov2tiny architecture

* (Incomplete) Find a bottleneck on non-max-suppression

To find out, I made a tracer and set on some functions

* Fix wrongly indented codes in nms function

* Reshape bounding boxes from (416, 416) to original video resolution

* Add parameters

* Show attributes

* Store every layer on first frame into intermediate folder

* WIP: Make a room for bias, but not yet implemented cuz still figuring out to use weights

* Change video codec to mp4v

* WIP: create layers hierarchically

YOLO-v2-tiny consists of nine composit layers.
And each layer consists of smaller layers such as conv, batch_norm, bias, maxpool, leakyReLU.
Therefore, I mimicked its hierarchy.

* Infer objects correctly

* Add explicit bias layers right after each conv layer.
* Load weights by using tf.Variable and attentive layers in tf.nn.
* Use leftupper coordinate and rightbottom coordinate in draw function since coordinates are already shaped in restore_shape function.
* Remove unused comments and debug lines
* Use original image in draw function.

* Measure inference, end-to-end, fps and total time

* Update for LaTeX

* Add report template

* Limit yolo takes only 70% of GPU VRAM

In small VRAM environment, allow_grouth option is not enough for preventing out of memory error.
So, referencing some information, I forced not to take more than 70% VRAM

* Clarify which values we save

* Update yolov2tiny.py

delete out_chan, default value of stride
통일성 위해서 maxpool도 그냥 max_pool2d로 하면 어떤지? (일단 주석해놈)

* Update yolov2tiny.py

* Update __init__.py

Add start time of "end-to-end time", beg_start and change previous beg -> beg_infer
Does change to name of "beg" in obj_detection effect to measure function? (I'm not sure)

* Update yolov2tiny.py

put back n_... values to post processing

* Create consider.txt

* Update yolov2tiny.py

confirm "tf.nn.max_pool2d" working well

* Update consider.txt

* Update __init__.py

move down  the saving intermediate first frame result(tensor) part for measuring necessary time

* Update consider.txt

* Update __init__.py

add printing total time

* Add some details

* Add comment about inference FPS

* Write detailed info of why I chose tf.nn functions

* Upload whole model visualization

I visualized the whole tf graph by using tf.train.Saver.
The only catch is, it is too verbose to see the main logic.
But I decided to save the visualized graph just in case.

* Add GPU benchmark

* Add cpu benchmark

* Add first draft of the report

* Update report.tex

Some change

* Update report.tex

* Change code location and table

* Edit figures

Co-authored-by: jehoon315 <[email protected]>