It can be used to build a spatial accelerator or an acccelerator for specific-mapping.
- PE_new.v
Each PE has register file for activation, weight and partial sum.
PE_array_controller should send correct address and enable/selection signal to PE_array in order to progress the MAC operation.
- psum_su_irrel_new.v
Since global buffers use high bandwidth(512 bits/cycle), partial sum from PE array should be accumulated using shorter clock than top_module.
However, you should consider the relation between the latency of overall temporal mapping on register files and the latency of psum_accumulator.
In this case, psum_su_irrel_new.v try to finish the calculation of the partial sum at one cycle. (But it requires high HW costs.)
3)Top_module.v
Detailed structure can be varied with the targeting accelerator.
Following structure is for the programmable/flexible accelerator.