Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu: use templates for de-duplicating some operators #1141

Closed
wants to merge 4 commits into from

Conversation

cmdr2
Copy link
Collaborator

@cmdr2 cmdr2 commented Mar 12, 2025

Note: This PR only deletes lines from ggml-cpu.c, it does not modify any functions in it. I'm not sure why git diff tries to combine them as changes. Standard diff shows the correct output. Actual diff of ggml-cpu.c: https://gist.github.com/cmdr2/a76df5af311417619788e8330b1908b3

This PR de-duplicates some of the easily-templatized functions in ggml-cpu.c. It takes inspiration from binbcast.cu.

  • Binary: add, sub, mul, div
  • Unary: abs, sgn, neg, step, tanh, elu, relu, sigmoid, hardsigmoid, exp, hardswish, sqr, sqrt, sin, cos, log

This removes the op implementation functions from ggml-cpu.c (around 2000 lines). As a side-effect, all the functions now support bf16 as well as non-contiguous src1.

The next PRs will attempt to:

  1. Use the traits table to remove the ugly-looking mass of template parameters and if-else conditions. That will enable cleaner support of quantized operations. For e.g. taking inspiration from the quantized add function.
  2. Use row-based type conversions, to work with the existing traits table functions.
  3. Move even more operator functions to C++ files.

The performance is the same as the current implementation. It also passes all the runners on ggml-ci, which tested non-contiguous inputs (in SAM) and vDSP on Mac.

@cmdr2 cmdr2 requested a review from slaren March 12, 2025 08:59
@cmdr2
Copy link
Collaborator Author

cmdr2 commented Mar 12, 2025

I'm happy to modify the approach based on feedback. This was one way to get this de-duplicated, without changing too many things at once.

@cmdr2
Copy link
Collaborator Author

cmdr2 commented Mar 12, 2025

Also, please let me know if the templates-based approach is too ugly. I can fast-track the traits table-based approach (my next PR), and use row-wise conversion functions instead.

I was trying to minimize the amount of changes in this PR, which is why I left that out. The existing implementation doesn't use row-wise conversions either, so this PR just refactors that.

@cmdr2
Copy link
Collaborator Author

cmdr2 commented Mar 13, 2025

I have a cleaner solution coming up shortly, please hold off.

@cmdr2 cmdr2 closed this Mar 13, 2025
@cmdr2 cmdr2 deleted the op-templates branch March 13, 2025 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant