-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic Kernels Assignments #337
Comments
Because assembling is much faster than OCL/HIP compilation? |
Basically speaking, if source code is The decision to choose [1] ASM-dynamic or [2] HIP/OCL-dynamic I think should based on following factors:
So from my humble experience, we can have following preference:
|
@carlushuang Thanks for explanations. Just in case: AFAICS the assembly builds are ~100 times faster than HIP builds and ~15 times faster than OCL builds (you can try auto-tuning and see how many kernels fit into 3 second logging intervals). Therefore even linear transformation from HIP/OCL to ASM (without adding any "dynamism") would yield substantial acceleration. Of course, extending the coverage of a kernel (making it more "dynamic" than before) is the preferred way because it also saves space in the binary cache. |
May I know the average compilation time for HIP/OCL/ASM kernel? I saw HIP takes 4s or more and ASM takes 100ms-200ms. Given much of kernel time is below 100ms, if each config only run once dynamic kernel still beats static ones. |
@sabreshao The initial push for this is to support mask-rcnn and retinanet type networks. |
Purpose
This project exists to minimize our reliance on compile time parameterization in MIOpen's source kernels. The goal isn't to sacrifice performance, but rather determine a ways of reducing the compile time overhead of the first time iteration of neural networks using MIOpen.
Strategy
For some of these kernels the task is pretty straight forward; take the compile time parameters and move them into runtime parameters. In some cases this can be done without affecting performance. It may be the case, and often, that all compile time parameters may not be moved into the runtime without seriously affecting performance. In those cases we should identify those parameters that networks would change least frequently such that compiles are minimized. If remaining compile-time parameters do not reduce significantly the number of compiles, then it may be the case the the kernel should be converted to assembly code.
Priority Tasks
Data collection
Structural
Convolution Changes
Priority: HIGH
iGEMM HIP source refactor (@asroy) 7/24/2020
ASM-iGEMM (@carlushuang , @shaojiewang, @jane-zxy, @fronteer)
Make ASM iGEMM kernels only in hybrid MIOPEN_FIND_MODE (ISSUE: Hybrid mode with iGEMM hit detection #299) (@zjing14 ) ROCm 3.8
Data collection for solver usage via Tuna (@ce1adon)
Non-Convolution Changes
Priority: HIGH
Priority: MEDIUM
copyTensor / castTensor / setTensor / scaleTensor ROCm 3.8
subSample / upSample (@alexandraBara) ROCm 3.8
TensorOps (@ce1adon) ROCm 3.8
Activations (@cderb) ROCm 3.8
transpose_NCHW2CNHW / transpose_CNHW2NCHW ROCm 3.8
RNN / RNN Update (@ce1adon) ROCm 3.8
Pooling
The text was updated successfully, but these errors were encountered: