PyTorch frontend for fpgaConvNet, providing emulated accuracy results for features such as quantization and sparsity.
- models/ general interfaces for model creation, inference and onnx export.
- quantization/ emulation for fixed point, and block floating point representations.
- sparsity/ post-activation sparsity, and also tunable threshold relu.
- optimiser_interface/ python interface to launch fpgaconvnet optimiser and collect prediction results.
python quantization_example.py
python activation_sparsity_example.py
python threshold_relu_example.py
python encoding_example.py
imagenet
:resnet18
,resnet50
,mobilenet_v2
,repvgg_a0
coco
:yolov8n
camvid
:unet
cityscapes
:unet
llgmri
:unet
ucf101
:x3d_s
,x3d_m
brats2020
:unet3d
Model | Source | Float32 | Fixed16 | Fixed8 | BFP8 (Layer) | BFP8 (Channel) |
---|---|---|---|---|---|---|
resnet18 | torchvision | 69.76 | 69.76 | 1.03 | 68.48 | 69.26 |
resnet50 | torchvision | 76.13 | 76.10 | 0.36 | 74.38 | 75.75 |
mobilenet_v2 | torchvision | 71.87 | 71.76 | 0.10 | 53.68 | 69.51 |
repvgg_a0 | timm | 72.41 | 72.40 | 0.21 | 0.21 | 66.08 |
Model | Source | Float32 | Fixed16 | Fixed8 | BFP8 (Layer) | BFP8 (Channel) |
---|---|---|---|---|---|---|
yolov8n | ultralytics | 37.1 | 37.1 | 0.0 | 29.6 | 35.1 |
yolov8s | ultralytics | 39.2 | 39.1 | 0.0 | 38.7 | 36.8 |
Model | Source | Float32 | Fixed16 | Fixed8 | BFP8 (Layer) | BFP8 (Channel) |
---|---|---|---|---|---|---|
unet | nncf | 71.95 | 71.95 | 61.02 | 71.60 | 71.85 |
unet-bilinear | nncf | 71.67 | 71.67 | 60.62 | 71.40 | 71.75 |
Model | Source | Float32 | Fixed16 | Fixed8 | BFP8 (Layer) | BFP8 (Channel) |
---|---|---|---|---|---|---|
unet | mmsegmentation | 69.10 | 69.10 | 1.98 | 61.74 | 68.43 |
Model | Source | Float32 | Fixed16 | Fixed8 | BFP8 (Layer) | BFP8 (Channel) |
---|---|---|---|---|---|---|
unet | brain-segmentation-pytorch | 90.89 | 90.88 | 80.98 | 90.95 | 90.85 |
unet-bilinear | brain-segmentation-pytorch | 91.05 | 91.05 | 77.51 | 91.04 | 91.03 |
Model | Source | Float32 | Fixed16 | Fixed8 | BFP8 (Layer) | BFP8 (Channel) |
---|---|---|---|---|---|---|
x3d_s | mmaction2 | 93.68 | 93.57 | 1.13 | 90.21 | 93.57 |
x3d_m | mmaction2 | 96.40 | 96.40 | 0.81 | 95.24 | 96.29 |
Model | Source | Float32 | Fixed16 | Fixed8 | BFP8 (Layer) | BFP8 (Channel) |
---|---|---|---|---|---|---|
unet3d | BraTS20_3dUnet_3dAutoEncoder | 85.34 | 85.23 | 1.15 | 85.14 | 85.34 |
- Q - Fixed16 Quantization
- AS - Activation Sparsity
- WS - Weight Sparsity (applying global pruning threshold)
- Post-training, without fine-tuning
Model | Experiment | Accuracy | Sparsity |
---|---|---|---|
resnet18 | Q+AS | 69.74 | 50.75 |
resnet18 | Q+AS+WS(0.005) | 69.42 | 56.33 |
resnet18 | Q+AS+WS(0.010) | 67.36 | 61.47 |
resnet18 | Q+AS+WS(0.015) | 58.38 | 65.91 |
resnet18 | Q+AS+WS(0.020) | 27.91 | 69.63 |
- BFP8 (Channel) Quantization
- RLE-8, run-length encoding, use 8 bits for encoding (max length 2^8)
- Compression Ratio, average over all weights and activations
Dataset | Model | Experiment | Avg Compression Ratio |
---|---|---|---|
coco | yolov8n (onnx) | RLE-8 | 1.753 |
camvid | unet-bilinear (onnx) | RLE-8 | 1.175 |
cityscapes | unet (onnx) | RLE-8 | GPU TIMEOUT |
ucf101 | x3d_s (onnx) | RLE-8 | 1.737 |
ucf101 | x3d_m (onnx) | RLE-8 | 1.721 |
brats2020 | unet3d (onnx) | RLE-8 | 0.821 |
coco | yolov8n (onnx) | RLE-4 | 1.317 |
camvid | unet-bilinear (onnx) | RLE-4 | 0.717 |
coco | yolov8n (onnx) | RLE-2 | 1.112 |
camvid | unet-bilinear (onnx) | RLE-2 | 0.838 |
coco | yolov8n (onnx) | Huffman | 0.824 |
coco | yolov8s (onnx) | Huffman | 0.805 |
camvid | unet-bilinear (onnx) | Huffman | 0.684 |
cityscapes | unet (onnx) | Huffman | 0.692 |
ucf101 | x3d_s (onnx) | Huffman | 0.835 |
ucf101 | x3d_m (onnx) | Huffman | 0.833 |
brats2020 | unet3d (onnx) | Huffman | 0.718 |