Structural Reparameterization of convolutional layers is a very useful trick first introduced in RepVGG: Making VGG-style ConvNets Great Again (arxiv) and later used in MobileOne: An Improved One millisecond Mobile Backbone (arxiv) among others. The basic idea is as follows:
- Train an overparameterized, branching network. This network is larger and more complicated than necessary to better facilitate gradient flow. Here's an example of a very simple, toy archirecture with two stages (the first stage has
in_channels != out_channels
, and the second stage hasin_channels == out_channels
, which is the segment with the third batchnorm operation):
This type of thing is critical for edge deployments where MAC and such come at a premium. For more reading on efficient model design I recommend ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design (arxiv). The earlier papers mentioned borrow a lot of concepts from this one.
example.py
contains everything you need to get started. Here I'll use a small RepVGG architecture and train for a little bit. After 30 epochs (only 30 because my personal GPU is precious), we get:
Final Original Accuracy: 0.839, Mean Inference Time: 0.007736
Final Reparameterized Accuracy: 0.845, Mean Inference Time: 0.003202
That's a 50% speed-up as well as a little boost in accuracy. Nice.