Skip to content

Commit

Permalink
Updated with fresh results + OpenCL note
Browse files Browse the repository at this point in the history
Updated with fresh results and notice on OpenCL implementation
  • Loading branch information
ekondis committed Nov 21, 2015
1 parent 1d743cc commit 997d76d
Showing 1 changed file with 52 additions and 54 deletions.
106 changes: 52 additions & 54 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# mixbench
The purpose of this benchmark tool is to evaluate performance bounds of GPUs on mixed operational intensity kernels. The executed kernel is customized on a range of different operational intensity values. Modern GPUs are able to hide memory latency by switching execution to compute operations. Using this tool one can assess the practical optimum balance in both types of operations for a GPU. It's is based on CUDA programming platform so it can be executed only on NVidia GPUs. A long term goal is to develop an OpenCL port.
The purpose of this benchmark tool is to evaluate performance bounds of GPUs on mixed operational intensity kernels. The executed kernel is customized on a range of different operational intensity values. Modern GPUs are able to hide memory latency by switching execution to compute operations. Using this tool one can assess the practical optimum balance in both types of operations for a GPU. Both CUDA and OpenCL implementations have been developed.

Kernel types
--------------
Expand All @@ -25,73 +25,71 @@ Afterwards, just do make:
make
```

Two executables will be produced: "mixbench-cuda" & "mixbench-cuda-bs". The former applies grid strides between accesses of the same thread where the latter applies block size strides.
Three executables will be produced: "mixbench-cuda", "mixbench-cuda-bs" and "mixbench-ocl". The two former comprise the CUDA implementations and the latter the OpenCL implementation. The fist applies grid strides between accesses of the same thread where the second applies block size strides. Both methods are supported by the OpenCL implementation and can be selected using a command line option.

Execution results
--------------

A typical execution output on a GTX480 GPU is:
A typical execution output on an NVidia GTX660 GPU is:
```
mixbench (compute & memory balancing GPU microbenchmark)
------------------------ Device specifications ------------------------
Device: GeForce GTX 480
CUDA driver version: 5.50
GPU clock rate: 1401 MHz
Memory clock rate: 924 MHz
Memory bus width: 384 bits
Device: GeForce GTX 660
CUDA driver version: 7.50
GPU clock rate: 1097 MHz
Memory clock rate: 1502 MHz
Memory bus width: 192 bits
WarpSize: 32
L2 cache size: 768 KB
Total global mem: 1535 MB
L2 cache size: 384 KB
Total global mem: 2042 MB
ECC enabled: No
Compute Capability: 2.0
Total SPs: 480 (15 MPs x 32 SPs/MP)
Compute throughput: 1344.96 GFlops (theoretical single precision FMAs)
Memory bandwidth: 177.41 GB/sec
Compute Capability: 3.0
Total SPs: 960 (5 MPs x 192 SPs/MP)
Compute throughput: 2107.20 GFlops (theoretical single precision FMAs)
Memory bandwidth: 144.19 GB/sec
-----------------------------------------------------------------------
Total GPU memory 1610285056, free 1195106304
Total GPU memory 2141913088, free 2106126336
Buffer size: 256MB
Trade-off type:compute with global memory (block strided)
----------------------------------------- EXCEL data -----------------------------------------
Operations ratio, Single Precision ops,,, Double precision ops,,, Integer operations
compute/memory, Time, GFLOPS, GB/sec, Time, GFLOPS, GB/sec, Time, GIOPS, GB/sec
0/32, 240.46, 0.00, 142.89, 475.48, 0.00, 144.53, 240.35, 0.00, 142.96
1/31, 233.58, 9.19, 142.50, 460.28, 4.67, 144.63, 233.64, 9.19, 142.47
2/30, 225.32, 19.06, 142.96, 445.09, 9.65, 144.75, 225.31, 19.06, 142.97
3/29, 218.79, 29.45, 142.32, 430.34, 14.97, 144.72, 218.59, 29.47, 142.45
4/28, 210.23, 40.86, 143.01, 415.24, 20.69, 144.81, 210.31, 40.84, 142.95
5/27, 203.21, 52.84, 142.66, 400.51, 26.81, 144.77, 203.09, 52.87, 142.75
6/26, 194.33, 66.31, 143.66, 385.32, 33.44, 144.91, 194.37, 66.29, 143.63
7/25, 187.40, 80.21, 143.24, 370.81, 40.54, 144.78, 187.47, 80.19, 143.19
8/24, 175.16, 98.08, 147.12, 355.76, 48.29, 144.87, 175.16, 98.08, 147.12
9/23, 171.80, 112.50, 143.75, 341.41, 56.61, 144.67, 171.87, 112.45, 143.69
10/22, 163.28, 131.52, 144.67, 326.14, 65.85, 144.86, 163.18, 131.61, 144.77
11/21, 155.63, 151.78, 144.88, 311.48, 75.84, 144.79, 155.79, 151.63, 144.73
12/20, 146.60, 175.78, 146.48, 296.45, 86.93, 144.88, 146.69, 175.68, 146.40
13/19, 138.97, 200.89, 146.80, 281.73, 99.09, 144.83, 139.08, 200.73, 146.69
14/18, 129.64, 231.92, 149.09, 266.31, 112.90, 145.15, 130.01, 231.26, 148.67
15/17, 121.12, 265.96, 150.71, 251.39, 128.14, 145.22, 121.41, 265.31, 150.34
16/16, 120.11, 286.07, 143.04, 235.84, 145.69, 145.69, 120.05, 286.22, 143.11
17/15, 111.36, 327.82, 144.63, 219.47, 166.34, 146.77, 111.63, 327.05, 144.29
18/14, 106.48, 363.01, 141.17, 231.50, 166.98, 129.87, 106.66, 362.40, 140.93
19/13, 96.10, 424.59, 145.25, 244.40, 166.95, 114.23, 96.65, 422.16, 144.42
20/12, 89.51, 479.84, 143.95, 257.19, 167.00, 100.20, 89.76, 478.51, 143.55
21/11, 81.93, 550.41, 144.15, 269.05, 167.61, 87.80, 83.18, 542.19, 142.00
22/10, 76.16, 620.34, 140.99, 281.86, 167.61, 76.19, 76.29, 619.24, 140.74
23/ 9, 65.62, 752.64, 147.26, 295.74, 167.01, 65.35, 77.19, 639.87, 125.19
24/ 8, 60.76, 848.31, 141.38, 308.62, 167.00, 55.67, 80.10, 643.41, 107.24
25/ 7, 52.04, 1031.56, 144.42, 321.45, 167.02, 46.76, 83.28, 644.68, 90.25
26/ 6, 48.32, 1155.46, 133.32, 334.31, 167.02, 38.54, 86.50, 645.52, 74.48
27/ 5, 49.51, 1171.09, 108.43, 347.16, 167.02, 30.93, 89.72, 646.25, 59.84
28/ 4, 50.70, 1185.88, 84.71, 360.01, 167.02, 23.86, 92.90, 647.25, 46.23
29/ 3, 52.03, 1197.01, 61.91, 372.87, 167.02, 17.28, 96.09, 648.10, 33.52
30/ 2, 53.37, 1207.05, 40.23, 385.72, 167.02, 11.13, 99.33, 648.58, 21.62
31/ 1, 53.41, 1246.43, 20.10, 397.20, 167.60, 5.41, 100.91, 659.74, 10.64
32/ 0, 53.50, 1284.51, 0.00, 410.01, 167.60, 0.00, 102.51, 670.40, 0.00
----------------------------------------------------------------------------------------------
--------------------------------------------------- CSV data --------------------------------------------------
Single Precision ops,,,, Double precision ops,,,, Integer operations,,,
Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Iops/byte, ex.time, GIOPS, GB/sec
0.000, 331.11, 0.00, 103.77, 0.000, 666.67, 0.00, 103.08, 0.000, 330.91, 0.00, 103.83
0.065, 321.70, 6.68, 103.47, 0.032, 645.48, 3.33, 103.14, 0.065, 321.53, 6.68, 103.53
0.133, 310.38, 13.84, 103.78, 0.067, 623.54, 6.89, 103.32, 0.133, 310.32, 13.84, 103.80
0.207, 299.76, 21.49, 103.88, 0.103, 603.28, 10.68, 103.23, 0.207, 299.81, 21.49, 103.86
0.286, 288.37, 29.79, 104.26, 0.143, 581.11, 14.78, 103.47, 0.286, 288.19, 29.81, 104.32
0.370, 278.62, 38.54, 104.05, 0.185, 561.75, 19.11, 103.22, 0.370, 279.04, 38.48, 103.90
0.462, 264.36, 48.74, 105.60, 0.231, 540.07, 23.86, 103.38, 0.462, 264.69, 48.68, 105.47
0.560, 254.86, 58.98, 105.33, 0.280, 520.83, 28.86, 103.08, 0.560, 255.16, 58.91, 105.20
0.667, 242.78, 70.76, 106.14, 0.333, 497.84, 34.51, 103.53, 0.667, 243.07, 70.68, 106.02
0.783, 231.60, 83.45, 106.63, 0.391, 478.86, 40.36, 103.14, 0.783, 232.46, 83.14, 106.24
0.909, 217.54, 98.72, 108.59, 0.455, 456.43, 47.05, 103.51, 0.909, 217.62, 98.68, 108.55
1.048, 208.25, 113.43, 108.28, 0.524, 435.83, 54.20, 103.47, 1.048, 208.34, 113.38, 108.23
1.200, 194.99, 132.16, 110.14, 0.600, 413.28, 62.35, 103.92, 1.200, 195.24, 131.99, 109.99
1.368, 186.92, 149.36, 109.14, 0.684, 392.24, 71.17, 104.02, 1.368, 185.82, 150.24, 109.79
1.556, 173.37, 173.42, 111.48, 0.778, 356.28, 84.39, 108.50, 1.556, 172.02, 174.77, 112.35
1.765, 164.31, 196.04, 111.09, 0.882, 378.13, 85.19, 96.55, 1.765, 161.31, 199.69, 113.16
2.000, 166.16, 206.78, 103.39, 1.000, 401.90, 85.49, 85.49, 2.000, 165.62, 207.46, 103.73
2.267, 150.74, 242.19, 106.85, 1.133, 425.10, 85.88, 75.78, 2.267, 151.01, 241.75, 106.65
2.571, 144.55, 267.42, 104.00, 1.286, 449.01, 86.09, 66.96, 2.571, 143.85, 268.71, 104.50
2.923, 130.06, 313.73, 107.33, 1.462, 470.90, 86.65, 59.29, 2.923, 132.64, 307.61, 105.23
3.333, 122.40, 350.90, 105.27, 1.667, 492.46, 87.21, 52.33, 3.333, 135.07, 317.97, 95.39
3.818, 108.68, 414.96, 108.68, 1.909, 515.52, 87.48, 45.82, 3.818, 135.69, 332.36, 87.05
4.400, 103.54, 456.29, 103.70, 2.200, 540.59, 87.39, 39.72, 4.400, 143.56, 329.10, 74.79
5.111, 86.99, 567.81, 111.09, 2.556, 562.18, 87.86, 34.38, 5.111, 148.07, 333.56, 65.26
6.000, 82.94, 621.40, 103.57, 3.000, 586.40, 87.89, 29.30, 6.000, 157.30, 327.65, 54.61
7.143, 66.18, 811.25, 113.58, 3.571, 612.78, 87.61, 24.53, 7.143, 162.70, 329.97, 46.20
8.667, 61.83, 902.97, 104.19, 4.333, 636.91, 87.67, 20.23, 8.667, 170.39, 327.70, 37.81
10.800, 44.39, 1306.09, 120.93, 5.400, 658.38, 88.07, 16.31, 10.800, 173.27, 334.64, 30.98
14.000, 41.02, 1465.77, 104.70, 7.000, 681.96, 88.17, 12.60, 14.000, 177.82, 338.15, 24.15
19.333, 39.30, 1584.64, 81.96, 9.667, 703.63, 88.51, 9.16, 19.333, 182.63, 340.99, 17.64
30.000, 38.84, 1658.88, 55.30, 15.000, 724.53, 88.92, 5.93, 30.000, 189.26, 340.40, 11.35
62.000, 35.55, 1872.54, 30.20, 31.000, 743.62, 89.52, 2.89, 62.000, 188.37, 353.41, 5.70
inf, 35.92, 1912.98, 0.00, inf, 765.54, 89.77, 0.00, inf, 191.35, 359.14, 0.00
---------------------------------------------------------------------------------------------------------------
```

Keep in mind that the relation of a compute to a memory operation is not one by one. One compute operation corresponds to 8 Flops/Iops and one memory operation corresponds to 1 element access (either 32bit float, 64bit float or 32bit int).

Publications
--------------

Expand Down

0 comments on commit 997d76d

Please sign in to comment.