Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find takes >1 sec for some conv configs #2814

Open
atamazov opened this issue Mar 15, 2024 · 3 comments
Open

Find takes >1 sec for some conv configs #2814

atamazov opened this issue Mar 15, 2024 · 3 comments

Comments

@atamazov
Copy link
Contributor

@kyeonghwanryu wrote:
I found some cases that took >1s for miopenFindConvolution[Direction]Algorithm. I'm not sure it's related to this issue.

Config Algorithm Time (ms)
64-56-56-1x1-64-56-56-16-0x0-1x1-1x1-0-NCHW-FP32-F miopenConvolutionFwdAlgoImplicitGEMM 2,583
256-56-56-1x1-128-56-56-16-0x0-1x1-1x1-0-NCHW-FP32-F miopenConvolutionFwdAlgoImplicitGEMM 2,416
128-28-28-1x1-512-28-28-16-0x0-1x1-1x1-0-NCHW-FP32-F miopenConvolutionFwdAlgoImplicitGEMM 2,538
256-56-56-1x1-512-28-28-16-0x0-2x2-1x1-0-NCHW-FP32-F miopenConvolutionFwdAlgoImplicitGEMM 2,447
512-28-28-1x1-256-28-28-16-0x0-1x1-1x1-0-NCHW-FP32-F miopenConvolutionFwdAlgoImplicitGEMM 2,152
512-28-28-1x1-256-28-28-16-0x0-1x1-1x1-0-NCHW-FP32-W miopenConvolutionBwdWeightsAlgoImplicitGEMM 2,858
256-56-56-1x1-128-56-56-16-0x0-1x1-1x1-0-NCHW-FP32-W miopenConvolutionBwdWeightsAlgoImplicitGEMM 2,816
256-56-56-1x1-512-28-28-1-0x0-2x2-1x1-0-NCHW-FP32-F miopenConvolutionFwdAlgoImplicitGEMM 2,321
1024-14-14-1x1-256-14-14-1-0x0-1x1-1x1-0-NCHW-FP32-B miopenConvolutionBwdDataAlgoImplicitGEMM 1,176
64-56-56-1x1-256-56-56-1-0x0-1x1-1x1-0-NCHW-FP32-W miopenConvolutionBwdWeightsAlgoDirect 1,002
64-56-56-1x1-64-56-56-1-0x0-1x1-1x1-0-NCHW-FP32-W miopenConvolutionBwdWeightsAlgoDirect 1,012
64-56-56-1x1-256-56-56-8-0x0-1x1-1x1-0-NCHW-FP32-F miopenConvolutionFwdAlgoImplicitGEMM 2,787
256-56-56-1x1-128-56-56-8-0x0-1x1-1x1-0-NCHW-FP32-F miopenConvolutionFwdAlgoImplicitGEMM 2,672
3-224-224-11x11-64-55-55-16-2x2-4x4-1x1-1-NCHW-FP32-F miopenConvolutionFwdAlgoImplicitGEMM 4,076
256-13-13-3x3-256-13-13-1-1x1-1x1-1x1-1-NCHW-FP32-F miopenConvolutionFwdAlgoImplicitGEMM 4,375
3-224-224-11x11-64-55-55-8-2x2-4x4-1x1-1-NCHW-FP32-F miopenConvolutionFwdAlgoImplicitGEMM 2,818

Originally posted by @kyeonghwanryu in #2771 (comment)

@atamazov
Copy link
Contributor Author

@atamazov
Copy link
Contributor Author

I found some cases that took >1s for miopenFindConvolution[Direction]Algorithm.
We need to identify the reason of this.

I suspect that the reason is that some of the kernels (likely HIP kernels because their build times are long) are missing from the precompiled binary kernel cache. But there could be dozens of other reasons ;)

@kyeonghwanryu We need more info to reproduce the issue.

  • Please specify what is the GPU type -- MI300 or MI250?
  • Have you used the the normal installable build (i.e the one with binary cache enabled)?
  • Do you have the precompiled kernels package installed?
  • Do you see the the excessive delays during the first run only or subsequent runs also look suspicious?

The delays can also be caused by copying buffers from the GPU to the CPU and back (this can take a long time), so please also check your application.

I would appreciate if you provide me with the log taken from one of the suspicious configs with the following environment settings:

export MIOPEN_ENABLE_LOGGING=1
export MIOPEN_ENABLE_LOGGING_CMD=1
export MIOPEN_LOG_LEVEL=6
export MIOPEN_ENABLE_LOGGING_ELAPSED_TIME=1
export MIOPEN_DEBUG_LOGGING_QUIETING_DISABLE=1

If your application is multi-threaded or uses MIOpen in the multi-process context, then please also add

export MIOPEN_ENABLE_LOGGING_MPMT=1

Thanks!

@atamazov
Copy link
Contributor Author

atamazov commented Mar 15, 2024

@JehandadKhan @cderb Can you please check if we have the configs shown above reside in the list of the "favorite" configs that we use to populate the system-find-db, system-perf-db and the precompiled kernel package?

I've looked for 64-56-56-1x1-64-56-56-16-0x0-1x1-1x1-0-NCHW-FP32-F in find-db and for 2x64x56x56x1x1x1x1x64x16x0x0x0x1x1x0x1x1x0x0x1xNCHWxFP32xF in perf-db. Results:

  • gfx90a68
    • find-db: ConvHipImplicitGemmForwardV4R4Xdlops is the best, it is tunable
    • perf-db: ConvHipImplicitGemmForwardV4R4Xdlops:64,256,2,64,128,4,0,1,4
  • gfx90a6e
    • find-db: ConvMlirIgemmFwdXdlops is the best, it is tunable
    • perf-db: ConvMlirIgemmFwdXdlops:64,256,2,64,64,4,1,1
  • gfx942130
    • find-db: ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC is the best, it is tunable
    • perf-db: ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC: fwd,nhwc,fp32,0,0,64,64,32,16,16,4,1,1,2,2,0,0,0,0,0,1,4,2,1,1,8,1,32,1,4,2,1,1,8,1,32,54

So at the fist glance find-db and perf-db look correct, but I am not sure about kernel db.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants