It can be used to build a spatial accelerator or an acccelerator for specific-mapping.
- PE_new.v
Each PE has register file for activation, weight and partial sum.
PE_array_controller should send correct address and enable/selection signal to PE_array in order to progress the MAC operation.
- psum_su_irrel_new.v
Since global buffers use high bandwidth(512 bits/cycle), partial sum from PE array should be accumulated using shorter clock than top_module.
However, you should consider the relation between the latency of overall temporal mapping on register files and the latency of psum_accumulator.
In this case, psum_su_irrel_new.v try to finish the calculation of the partial sum at one cycle. (But it requires high HW costs.)
Detailed structure can be varied with the targeting accelerator.
Following structure is for the programmable/flexible accelerator.