cpu: use templates for de-duplicating some operators #1141
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note: This PR only deletes lines from
ggml-cpu.c
, it does not modify any functions in it. I'm not sure why git diff tries to combine them as changes. Standarddiff
shows the correct output. Actual diff ofggml-cpu.c
: https://gist.github.com/cmdr2/a76df5af311417619788e8330b1908b3This PR de-duplicates some of the easily-templatized functions in
ggml-cpu.c
. It takes inspiration frombinbcast.cu
.add
,sub
,mul
,div
abs
,sgn
,neg
,step
,tanh
,elu
,relu
,sigmoid
,hardsigmoid
,exp
,hardswish
,sqr
,sqrt
,sin
,cos
,log
This removes the op implementation functions from
ggml-cpu.c
(around 2000 lines). As a side-effect, all the functions now support bf16 as well as non-contiguoussrc1
.The next PRs will attempt to:
traits
table to remove the ugly-looking mass of template parameters and if-else conditions. That will enable cleaner support of quantized operations. For e.g. taking inspiration from the quantized add function.The performance is the same as the current implementation. It also passes all the runners on ggml-ci, which tested non-contiguous inputs (in
SAM
) andvDSP
on Mac.