-
Notifications
You must be signed in to change notification settings - Fork 175
GPU Benchmarks
ProGamerGov edited this page Nov 29, 2019
·
2 revisions
The timing modification of neural_style_time.py is used to accurately track the time it takes to complete style transfer.
The following parameters are used to generate the timing data:
python3 neural_style_time.py -backend nn -optimizer lbfgs -num_iterations 500 -print_iter 0
python3 neural_style_time.py -backend nn -optimizer adam -num_iterations 500 -print_iter 0
python3 neural_style_time.py -backend cudnn -optimizer lbfgs -num_iterations 500 -print_iter 0
python3 neural_style_time.py -backend cudnn -optimizer adam -num_iterations 500 -print_iter 0
python3 neural_style_time.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -print_iter 0
python3 neural_style_time.py -backend cudnn -cudnn_autotune -optimizer adam -num_iterations 500 -print_iter 0
- Each test is run 3 times, and then the average of those 3 runs if rounded to the nearest second.
Speed can vary a lot depending on the backend and the optimizer.
Here are some times for running 500 iterations with -image_size=512
on a Tesla K80 with different settings:
-
-backend nn -optimizer lbfgs
: 117 seconds -
-backend nn -optimizer adam
: 100 seconds -
-backend cudnn -optimizer lbfgs
: 124 seconds -
-backend cudnn -optimizer adam
: 107 seconds -
-backend cudnn -cudnn_autotune -optimizer lbfgs
: 109 seconds -
-backend cudnn -cudnn_autotune -optimizer adam
: 91 seconds
Here are the same benchmarks on a GTX 1080:
-
-backend nn -optimizer lbfgs
: 56 seconds -
-backend nn -optimizer adam
: 38 seconds -
-backend cudnn -optimizer lbfgs
: 40 seconds -
-backend cudnn -optimizer adam
: 40 seconds -
-backend cudnn -cudnn_autotune -optimizer lbfgs
: 23 seconds -
-backend cudnn -cudnn_autotune -optimizer adam
: 24 seconds
Here are the same benchmarks on a NVIDIA GRID K520:
-
-backend nn -optimizer lbfgs
: 236 seconds -
-backend nn -optimizer adam
: 209 seconds -
-backend cudnn -optimizer lbfgs
: 226 seconds -
-backend cudnn -optimizer adam
: 200 seconds -
-backend cudnn -cudnn_autotune -optimizer lbfgs
: 226 seconds -
-backend cudnn -cudnn_autotune -optimizer adam
: 200 seconds
Here are the same benchmarks on a Tesla T4 with different settings:
-
-backend nn -optimizer lbfgs
: 72 seconds -
-backend nn -optimizer adam
: 66 seconds -
-backend cudnn -optimizer lbfgs
: 48 seconds -
-backend cudnn -optimizer adam
: 40 seconds -
-backend cudnn -cudnn_autotune -optimizer lbfgs
: 51 seconds -
-backend cudnn -cudnn_autotune -optimizer adam
: 43 seconds
Here are the same benchmarks on a Tesla P100-PCIE-16GB with different settings:
-
-backend nn -optimizer lbfgs
: 61 seconds -
-backend nn -optimizer adam
: 47 seconds -
-backend cudnn -optimizer lbfgs
: 37 seconds -
-backend cudnn -optimizer adam
: 23 seconds -
-backend cudnn -cudnn_autotune -optimizer lbfgs
: 39 seconds -
-backend cudnn -cudnn_autotune -optimizer adam
: 25 seconds