-
Notifications
You must be signed in to change notification settings - Fork 10
Project Motivation and Goals
Efficiency in hardware is vital as neural network models become more complex to tackle challenging problems, and optimizing ML hardware architectures has become a crucial research area. Scientists around the world, such as particle physicists at CERN need to accelerate their ML models in FPGA or custom ASICs for various applications including compressing the gigantic amount of data generated by the detectors at Large Hadron Collider (LHC).
However, implementing a DNN in hardware involves painstaking RTL design & verification, which takes time and effort. To solve this problem, HLS4ML, a user-friendly Python library was developed at CERN, enabling physicists to program ML code in Python and build synthesizable hardware on an FPGA through HLS (high-level synthesis). It is widely being adopted in the scientific community.
Yet, HLS4ML has limitations, particularly in supporting large-scale deep neural networks (DNN), which are crucial for many applications. Our project aims to address this limitation by creating a backend framework supporting large-scale DNN networks. This will enable efficient implementation of optimized hardware architectures on FPGA and ASIC for modern applications.
To be more specific, HLS4ML currently infers a new hardware module for each layer of the DNN in FPGA, making resource consumption scales quickly with the depth of the network. As a result, it is not possible to implement anything like a ResNet using HLS4ML. Thus, in real applications, it is desirable to process multiple layers in a single hardware engine. Our project seeks to make these hardware modules reusable, i.e., enabling data to flow through the same engine for multiple times, representing multiple layers. Therefore, we can allow the construction of large, complex ML models with limited hardware resources for various applications.