Model | Framework | Support | Example |
---|---|---|---|
ResNet50 V1.5 | TensorFlow | Yes | Link |
PyTorch | Yes | Link | |
DLRM | PyTorch | Yes | Link |
BERT large | TensorFlow | Yes | Link |
PyTorch | Yes | Link | |
SSD ResNet34 | TensorFlow | Yes | Link |
PyTorch | Yes | Link | |
RNN-T | PyTorch | Yes | Link |
3D-UNet | TensorFlow | WIP | |
PyTorch | Yes | Link |
Performance results test on 06/07/2022 with Intel Xeon Platinum 8380 Scalable processor, using 1 socket, 4 cores/instance, 10 instances and batch size 1.
Performance varies by use, configuration and other factors. See platform configuration for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks
Model | Accuracy | Performance throughput (samples/sec) |
Example | ||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio[(INT8-FP32)/FP32] | INT8 | FP32 | Performance Ratio[INT8/FP32] | ||
BERT large SQuAD | 92.39 | 92.99 | -0.64% | 25.32 | 12.53 | 2.02x | pb |
DenseNet121 | 73.57% | 72.89% | 0.93% | 370.52 | 329.74 | 1.12x | pb |
DenseNet161 | 76.24% | 76.29% | -0.07% | 219.46 | 180.75 | 1.21x | pb |
DenseNet169 | 74.40% | 74.65% | -0.33% | 301.33 | 259.88 | 1.16x | pb |
Faster R-CNN Inception ResNet V2 | 37.98% | 38.33% | -0.91% | 3.96 | 2.34 | 1.69x | pb |
Faster R-CNN Inception ResNet V2 | 37.84% | 38.33% | -1.28% | 3.98 | 2.31 | 1.72x | SavedModel |
Faster R-CNN ResNet101 | 30.28% | 30.39% | -0.36% | 70 | 19.98 | 3.50x | pb |
Faster R-CNN ResNet101 | 30.37% | 30.39% | -0.07% | 70.26 | 16.98 | 4.14x | SavedModel |
Inception ResNet V2 | 80.44% | 80.40% | 0.05% | 281.79 | 137.91 | 2.04x | pb |
Inception V1 | 70.48% | 69.74% | 1.06% | 2193.17 | 975.6 | 2.25x | pb |
Inception V2 | 74.36% | 73.97% | 0.53% | 1835.35 | 838.82 | 2.19x | pb |
Inception V3 | 77.28% | 76.75% | 0.69% | 973.42 | 376.3 | 2.59x | pb |
Inception V4 | 80.40% | 80.27% | 0.16% | 575.9 | 200.55 | 2.87x | pb |
Mask R-CNN Inception V2 | 28.53% | 28.73% | -0.70% | 132.51 | 50.3 | 2.63x | pb |
Mask R-CNN Inception V2 | 28.53% | 28.73% | -0.70% | 132.89 | 50.97 | 2.61x | ckpt |
MobileNet V1 | 71.79% | 70.96% | 1.17% | 3545.79 | 1191.94 | 2.97x | pb |
MobileNet V2 | 71.89% | 71.76% | 0.18% | 2431.66 | 1420.11 | 1.71x | pb |
ResNet101 | 77.50% | 76.45% | 1.37% | 877.91 | 355.49 | 2.47x | pb |
ResNet50 Fashion | 77.80% | 78.12% | -0.41% | 3977.5 | 2150.68 | 1.85x | pb |
ResNet50 V1.0 | 74.11% | 74.27% | -0.22% | 1509.64 | 472.66 | 3.19x | pb |
ResNet50 V1.5 | 76.82% | 76.46% | 0.47% | 1260.01 | 415.83 | 3.03x | pb |
ResNet V2 101 | 72.67% | 71.87% | 1.11% | 436.52 | 318.3 | 1.37x | pb |
ResNet V2 152 | 73.03% | 72.37% | 0.91% | 306.82 | 221.4 | 1.39x | pb |
ResNet V2 50 | 70.33% | 69.64% | 0.99% | 749.85 | 574.19 | 1.31x | pb |
SSD MobileNet V1 | 22.97% | 23.13% | -0.69% | 952.9 | 582.87 | 1.63x | pb |
SSD MobileNet V1 | 22.99% | 23.13% | -0.61% | 954.92 | 413.24 | 2.31x | ckpt |
SSD ResNet34 | 21.69% | 22.09% | -1.81% | 44.46 | 11.81 | 3.76x | pb |
SSD ResNet50 V1 | 37.86% | 38.00% | -0.37% | 69.5 | 26.04 | 2.67x | pb |
SSD ResNet50 V1 | 37.81% | 38.00% | -0.50% | 69.27 | 21.17 | 3.27x | ckpt |
VGG16 | 72.66% | 70.89% | 2.50% | 660.46 | 177.85 | 3.71x | pb |
VGG19 | 72.72% | 71.01% | 2.41% | 562.04 | 147.61 | 3.81x | pb |
Wide & Deep | 77.62% | 77.67% | -0.07% | 21332.47 | 19714.08 | 1.08x | pb |
Model | Accuracy | Performance throughput (samples/sec) |
Example | ||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio[(INT8-FP32)/FP32] | INT8 | FP32 | Performance Ratio[INT8/FP32] | ||
ALBERT base MRPC | 88.06% | 88.50% | -0.50% | 34.28 | 29.54 | 1.16x | eager |
Barthez MRPC | 82.99% | 83.81% | -0.97% | 166.84 | 89.56 | 1.86x | eager |
BERT base COLA | 58.80% | 58.84% | -0.07% | 260 | 126.47 | 2.06x | fx |
BERT base MRPC | 90.28% | 90.69% | -0.45% | 251.79 | 126.46 | 1.99x | fx |
BERT base RTE | 69.31% | 69.68% | -0.52% | 252.14 | 126.45 | 1.99x | fx |
BERT base SST2 | 91.97% | 91.86% | 0.12% | 258.98 | 126.42 | 2.05x | fx |
BERT base STSB | 89.13% | 89.75% | -0.68% | 249.57 | 126.39 | 1.97x | fx |
BERT large COLA | 62.88% | 62.57% | 0.49% | 88.75 | 36.7 | 2.42x | fx |
BERT large MRPC | 89.93% | 90.38% | -0.49% | 89.43 | 36.62 | 2.44x | fx |
BERT large QNLI | 90.96% | 91.82% | -0.94% | 91.27 | 37 | 2.47x | fx |
BERT large RTE | 71.84% | 72.56% | -1.00% | 77.62 | 36.01 | 2.16x | fx |
CamemBERT base MRPC | 86.56% | 86.82% | -0.30% | 241.39 | 124.77 | 1.93x | eager |
Deberta MRPC | 91.17% | 90.91% | 0.28% | 152.09 | 85.13 | 1.79x | eager |
DistilBERT base MRPC | 88.66% | 89.16% | -0.56% | 415.09 | 246.9 | 1.68x | eager |
DistilBERT base MRPC | 88.74% | 89.16% | -0.47% | 459.93 | 245.33 | 1.87x | fx |
FlauBERT MRPC | 81.01% | 80.19% | 1.01% | 644.05 | 457.32 | 1.41x | eager |
Inception V3 | 69.43% | 69.52% | -0.13% | 454.3 | 213.7 | 2.13x | eager |
Longformer MRPC | 90.59% | 91.46% | -0.95% | 21.51 | 17.45 | 1.23x | eager |
Mask R-CNN | 37.70% | 37.80% | -0.26% | 17.61 | 5.76 | 3.06x | eager |
mBart WNLI | 56.34% | 56.34% | 0.00% | 65.05 | 31.26 | 2.08x | eager |
MobileNet V2 | 70.54% | 71.84% | -1.81% | 740.97 | 535.54 | 1.38x | eager |
lvwerra/pegasus-samsum | 42.21 | 42.67 | -1.09% | 3.89 | 1.14 | 3.41x | eager |
PeleeNet | 71.64% | 72.10% | -0.64% | 502.01 | 391.31 | 1.28x | eager |
ResNet18 | 69.57% | 69.76% | -0.27% | 800.43 | 381.27 | 2.10x | eager |
ResNet18 | 69.57% | 69.76% | -0.28% | 811.09 | 389.36 | 2.08x | fx |
ResNet50 | 75.98% | 76.15% | -0.21% | 507.55 | 200.52 | 2.53x | eager |
ResNeXt101_32x8d | 79.08% | 79.31% | -0.29% | 203.54 | 73.85 | 2.76x | eager |
RNN-T | 92.45 | 92.55 | -0.10% | 79.21 | 20.47 | 3.87x | eager |
Roberta Base MRPC | 87.88% | 88.18% | -0.34% | 250.21 | 124.92 | 2.00x | eager |
Se_ResNeXt50_32x4d | 78.98% | 79.08% | -0.13% | 358.63 | 173.03 | 2.07x | eager |
SqueezeBERT MRPC | 87.77% | 87.65% | 0.14% | 249.89 | 207.43 | 1.20x | eager |
Transfo-xl MRPC | 81.97% | 81.20% | 0.94% | 11.25 | 8.34 | 1.35x | eager |
YOLOv3 | 24.60% | 24.54% | 0.21% | 108.09 | 40.02 | 2.70x | eager |
Model | Accuracy | Performance throughput (samples/sec) |
Example | ||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio[(INT8-FP32)/FP32] | INT8 | FP32 | Performance Ratio[INT8/FP32] | ||
ResNet18 | 69.74% | 69.76% | -0.03% | 804.76 | 388.67 | 2.07x | eager |
ResNet18 | 69.73% | 69.76% | -0.04% | 806.44 | 386.59 | 2.09x | fx |
BERT base MRPC QAT | 89.60% | 89.50% | 0.11% | 258.89 | 125.79 | 2.06x | fx |
ResNet50 | 76.04% | 76.15% | -0.14% | 490.64 | 203.49 | 2.41x | eager |
Model | Accuracy | Performance throughput (samples/sec) |
Example | ||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio[(INT8-FP32)/FP32] | INT8 | FP32 | Performance Ratio[INT8/FP32] | ||
bert-large-uncased-whole-word-masking-finetuned-squad | 92.9 | 93.16 | -0.28% | 37.13 | 11.45 | 3.24x | ipex |
ResNeXt101_32x16d_wsl | 84.02% | 84.17% | -0.18% | 163.45 | 28.9 | 5.66x | ipex |
ResNet50 | 76.00% | 76.15% | -0.20% | 707.86 | 202.02 | 3.51x | ipex |
SSD ResNet34 | 19.97% | 20.00% | -0.15% | 30.84 | 8.55 | 3.61x | ipex |
Model | Accuracy | Performance throughput (samples/sec) |
Example | ||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio[(INT8-FP32)/FP32] | INT8 | FP32 | Performance Ratio[INT8/FP32] | ||
AlexNet | 54.74% | 54.79% | -0.09% | 1518.97 | 676.74 | 2.24x | qlinearops |
AlexNet | 54.74% | 54.79% | -0.09% | 1411.3 | 652.6 | 2.16x | qdq |
BERT base MRPC DYNAMIC | 85.54% | 86.03% | -0.57% | 379.71 | 156.16 | 2.43x | qlinearops |
BERT base MRPC STATIC | 85.29% | 86.03% | -0.86% | 756.33 | 316.36 | 2.39x | qlinearops |
BERT SQuAD | 80.44 | 80.67 | -0.29% | 115.58 | 64.71 | 1.79x | qlinearops |
BERT SQuAD | 80.44 | 80.67 | -0.29% | 115.4 | 64.68 | 1.78x | qdq |
CaffeNet | 56.19% | 56.30% | -0.20% | 2786.79 | 802.7 | 3.47x | qlinearops |
CaffeNet | 56.19% | 56.30% | -0.20% | 2726.86 | 819.41 | 3.33x | qdq |
DenseNet | 60.20% | 60.96% | -1.25% | 404.83 | 340.63 | 1.19x | qlinearops |
DistilBERT base MRPC | 84.56% | 84.56% | 0.00% | 1630.41 | 596.68 | 2.73x | qlinearops |
EfficientNet | 77.58% | 77.70% | -0.15% | 1985.35 | 1097.33 | 1.81x | qlinearops |
Faster R-CNN | 33.99% | 34.37% | -1.11% | 10.02 | 4.32 | 2.32x | qlinearops |
Faster R-CNN | 33.94% | 34.37% | -1.25% | 10.41 | 4.28 | 2.43x | qdq |
FCN | 64.66% | 64.98% | -0.49% | 44.31 | 14.2 | 3.12x | qlinearops |
FCN | 64.66% | 64.98% | -0.49% | 18.11 | 14.19 | 1.28x | qdq |
GoogleNet | 67.61% | 67.79% | -0.27% | 1165.84 | 810.65 | 1.44x | qlinearops |
GoogleNet | 67.61% | 67.79% | -0.27% | 1165.73 | 809.98 | 1.44x | qdq |
Inception V1 | 67.23% | 67.24% | -0.01% | 1205.89 | 838.71 | 1.44x | qlinearops |
Inception V1 | 67.23% | 67.24% | -0.01% | 1204.93 | 843.16 | 1.43x | qdq |
Mask R-CNN | 33.40% | 33.72% | -0.95% | 8.56 | 3.76 | 2.27x | qlinearops |
Mask R-CNN | 33.33% | 33.72% | -1.16% | 8.4 | 3.81 | 2.20x | qdq |
Mobile bert MRPC | 86.03% | 86.27% | -0.28% | 790.11 | 686.35 | 1.15x | qlinearops |
MobileBERT SQuAD MLPerf | 89.84 | 90.03 | -0.20% | 102.92 | 95.19 | 1.08x | qlinearops |
MobileNet V2 | 65.47% | 66.89% | -2.12% | 5133.84 | 3394.73 | 1.51x | qlinearops |
MobileNet V2 | 65.47% | 66.89% | -2.12% | 5066.31 | 3386.3 | 1.50x | qdq |
MobileNet V3 MLPerf | 75.59% | 75.74% | -0.20% | 4133.22 | 2132.92 | 1.94x | qlinearops |
MobileNetV2 (ONNX Model Zoo) | 68.30% | 69.48% | -1.70% | 5349.42 | 3373.29 | 1.59x | qlinearops |
ResNet50 V1.5 MLPerf | 76.13% | 76.46% | -0.43% | 1139.56 | 549.88 | 2.07x | qlinearops |
ResNet50 V1.5 | 72.28% | 72.29% | -0.01% | 1165.35 | 556.02 | 2.10x | qlinearops |
ResNet50 V1.5 | 72.28% | 72.29% | -0.01% | 1319.32 | 543.44 | 2.43x | qdq |
ResNet50 V1.5 (ONNX Model Zoo) | 74.76% | 74.99% | -0.31% | 1363.39 | 573.1 | 2.38x | qlinearops |
Roberta Base MRPC | 90.44% | 89.95% | 0.54% | 811.05 | 312.71 | 2.59x | qlinearops |
ShuffleNet V2 | 66.13% | 66.36% | -0.35% | 4948.77 | 2847.66 | 1.74x | qlinearops |
SqueezeNet | 56.55% | 56.87% | -0.56% | 6296.79 | 4340.51 | 1.45x | qlinearops |
SqueezeNet | 56.55% | 56.87% | -0.56% | 6227.76 | 4383.8 | 1.42x | qdq |
SSD MobileNet V1 | 22.20% | 23.10% | -3.90% | 917.64 | 709.48 | 1.29x | qlinearops |
SSD MobileNet V1 | 22.20% | 23.10% | -3.90% | 840.99 | 655.99 | 1.28x | qdq |
SSD MobileNet V1 (ONNX Model Zoo) | 22.88% | 23.03% | -0.65% | 845.17 | 666.25 | 1.27x | qlinearops |
SSD MobileNet V1 (ONNX Model Zoo) | 22.88% | 23.03% | -0.65% | 790.06 | 624.2 | 1.27x | qdq |
SSD MobileNet V2 | 23.83% | 24.68% | -3.44% | 703.55 | 506.6 | 1.39x | qlinearops |
SSD | 18.68% | 18.98% | -1.58% | 41.99 | 11.12 | 3.78x | qdq |
Tiny YOLOv3 | 12.08% | 12.43% | -2.82% | 836.21 | 659.69 | 1.27x | qlinearops |
VGG16 | 66.60% | 66.69% | -0.13% | 312.48 | 128.98 | 2.42x | qlinearops |
VGG16 (ONNX Model Zoo) | 72.28% | 72.40% | -0.17% | 446.13 | 131.04 | 3.40x | qlinearops |
YOLOv3 | 26.88% | 28.74% | -6.47% | 157.39 | 66.72 | 2.36x | qlinearops |
YOLOv4 | 33.18% | 33.71% | -1.57% | 58.55 | 38.09 | 1.54x | qlinearops |
ZFNet | 55.89% | 55.96% | -0.13% | 664.37 | 358.62 | 1.85x | qlinearops |
ZFNet | 55.89% | 55.96% | -0.13% | 666.99 | 354.38 | 1.88x | qdq |
Model | Accuracy | Performance throughput (samples/sec) |
||||
---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio[(INT8-FP32)/FP32] | INT8 | FP32 | Performance Ratio[INT8/FP32] | |
Inception V3 | 77.80% | 77.65% | 0.20% | 920.74 | 276.73 | 3.33x |
MobileNet V1 | 71.60% | 72.23% | -0.86% | 6585.19 | 2529.21 | 2.60x |
MobileNet V2 | 70.80% | 70.87% | -0.10% | 5230.32 | 1996.47 | 2.62x |
ResNet V1 152 | 78.28% | 78.54% | -0.33% | 574.85 | 156.2 | 3.68x |
ResNet50 V1.0 | 75.91% | 76.33% | -0.55% | 1567.9 | 427.99 | 3.66x |
SqueezeNet | 56.80% | 56.97% | -0.28% | 4704.51 | 1332.29 | 3.53x |
SSD MobileNet V1 | 74.94% | 75.54% | -0.79% | 769.26 | 193.03 | 3.99x |
Tasks | Framework | Model | FP32 Baseline | Gradient Sensitivity with 20% Sparsity | +ONNX Dynamic Quantization on Pruned Model | ||||
---|---|---|---|---|---|---|---|---|---|
Accuracy% | Drop | Perf Gain (sample/s) | Accuracy% | Drop | Perf Gain (sample/s) | ||||
SST-2 | PyTorch | BERT base | accuracy = 92.32 | accuracy = 91.97 | -0.38 | 1.30x | accuracy = 92.20 | -0.13 | 1.86x |
QQP | PyTorch | BERT base | [accuracy, f1] = [91.10, 88.05] | [accuracy, f1] = [89.97, 86.54] | [-1.24, -1.71] | 1.32x | [accuracy, f1] = [89.75, 86.60] | [-1.48, -1.65] | 1.81x |
Tasks | Framework | Model | FP32 Baseline | Pattern Lock on 70% Unstructured Sparsity | Pattern Lock on 50% 1:2 Structured Sparsity | ||
---|---|---|---|---|---|---|---|
Accuracy% | Drop | Accuracy% | Drop | ||||
MNLI | PyTorch | BERT base | [m, mm] = [84.57, 84.79] | [m, mm] = [82.45, 83.27] | [-2.51, -1.80] | [m, mm] = [83.20, 84.11] | [-1.62, -0.80] |
SST-2 | PyTorch | BERT base | accuracy = 92.32 | accuracy = 91.51 | -0.88 | accuracy = 92.20 | -0.13 |
QQP | PyTorch | BERT base | [accuracy, f1] = [91.10, 88.05] | [accuracy, f1] = [90.48, 87.06] | [-0.68, -1.12] | [accuracy, f1] = [90.92, 87.78] | [-0.20, -0.31] |
QNLI | PyTorch | BERT base | accuracy = 91.54 | accuracy = 90.39 | -1.26 | accuracy = 90.87 | -0.73 |
QnA | PyTorch | BERT base | [em, f1] = [79.34, 87.10] | [em, f1] = [77.27, 85.75] | [-2.61, -1.54] | [em, f1] = [78.03, 86.50] | [-1.65, -0.69] |
Framework | Model | FP32 Baseline | Compression | Dataset | Accuracy% (Drop) |
---|---|---|---|---|---|
PyTorch | ResNet18 | 69.76 | 30% Sparsity on Magnitude | ImageNet | 69.47(-0.42) |
PyTorch | ResNet18 | 69.76 | 30% Sparsity on Gradient Sensitivity | ImageNet | 68.85(-1.30) |
PyTorch | ResNet50 | 76.13 | 30% Sparsity on Magnitude | ImageNet | 76.11(-0.03) |
PyTorch | ResNet50 | 76.13 | 30% Sparsity on Magnitude and Post Training Quantization | ImageNet | 76.01(-0.16) |
PyTorch | ResNet50 | 76.13 | 30% Sparsity on Magnitude and Quantization Aware Training | ImageNet | 75.90(-0.30) |
Example Name | Dataset | Student (Accuracy) |
Teacher (Accuracy) |
Student With Distillation (Accuracy Improvement) |
---|---|---|---|---|
ResNet example | ImageNet | ResNet18 (0.6739) |
ResNet50 (0.7399) |
0.6845 (0.0106) |
BlendCNN example | MRPC | BlendCNN (0.7034) |
BERT-Base (0.8382) |
0.7034 (0) |
BiLSTM example | SST-2 | BiLSTM (0.7913) |
RoBERTa-Base (0.9404) |
0.8085 (0.0172) |