cpu: use templates for de-duplicating some operators #1141

cmdr2 · 2025-03-12T08:55:27Z

Note: This PR only deletes lines from ggml-cpu.c, it does not modify any functions in it. I'm not sure why git diff tries to combine them as changes. Standard diff shows the correct output. Actual diff of ggml-cpu.c: https://gist.github.com/cmdr2/a76df5af311417619788e8330b1908b3

This PR de-duplicates some of the easily-templatized functions in ggml-cpu.c. It takes inspiration from binbcast.cu.

Binary: add, sub, mul, div
Unary: abs, sgn, neg, step, tanh, elu, relu, sigmoid, hardsigmoid, exp, hardswish, sqr, sqrt, sin, cos, log

This removes the op implementation functions from ggml-cpu.c (around 2000 lines). As a side-effect, all the functions now support bf16 as well as non-contiguous src1.

The next PRs will attempt to:

Use the traits table to remove the ugly-looking mass of template parameters and if-else conditions. That will enable cleaner support of quantized operations. For e.g. taking inspiration from the quantized add function.
Use row-based type conversions, to work with the existing traits table functions.
Move even more operator functions to C++ files.

The performance is the same as the current implementation. It also passes all the runners on ggml-ci, which tested non-contiguous inputs (in SAM) and vDSP on Mac.

…erations

cmdr2 · 2025-03-12T09:01:32Z

I'm happy to modify the approach based on feedback. This was one way to get this de-duplicated, without changing too many things at once.

cmdr2 · 2025-03-12T15:53:39Z

Also, please let me know if the templates-based approach is too ugly. I can fast-track the traits table-based approach (my next PR), and use row-wise conversion functions instead.

I was trying to minimize the amount of changes in this PR, which is why I left that out. The existing implementation doesn't use row-wise conversions either, so this PR just refactors that.

cmdr2 · 2025-03-13T04:14:47Z

I have a cleaner solution coming up shortly, please hold off.

…p tables

cmdr2 added 3 commits March 11, 2025 12:17

cpu: use templates for de-duplicating some of the binary and unary op…

9485af2

…erations

Fix non-contiguous typo

0b49092

Fixes for elu, vDSP and restrict non-contiguous to non-broadcast

90f2e58

cmdr2 requested a review from slaren March 12, 2025 08:59

cmdr2 mentioned this pull request Mar 12, 2025

ggml : refactor ggml-cpu.c into multiple C++ source files ggml-org/llama.cpp#10180

Open

Cleaner type conversions in binary/unary ops using compile-time looku…

b6a2cc4

…p tables

cmdr2 closed this Mar 13, 2025

cmdr2 deleted the op-templates branch March 13, 2025 05:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu: use templates for de-duplicating some operators #1141

cpu: use templates for de-duplicating some operators #1141

cmdr2 commented Mar 12, 2025 •

edited

Loading

cmdr2 commented Mar 12, 2025

cmdr2 commented Mar 12, 2025 •

edited

Loading

cmdr2 commented Mar 13, 2025

cpu: use templates for de-duplicating some operators #1141

cpu: use templates for de-duplicating some operators #1141

Conversation

cmdr2 commented Mar 12, 2025 • edited Loading

cmdr2 commented Mar 12, 2025

cmdr2 commented Mar 12, 2025 • edited Loading

cmdr2 commented Mar 13, 2025

cmdr2 commented Mar 12, 2025 •

edited

Loading

cmdr2 commented Mar 12, 2025 •

edited

Loading