-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Reduce enum sizes some are used in structs, which allowed them to be optimized. #13071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Sure I can do some tests. |
Now all CI tests will pass successfully and you can check, I have only affected |
12d22af
to
8622380
Compare
Strangely, CMake and Make builds locally ignore pedatic warnings as errors, but flags don't help me in CI build. |
How do you build this? cmake crys with a bunch of errors Edit: the normal makefile way.... My gcc seems to be too old |
I use Jetbrains IDE with Clang, I did not build it manually through terminal. This IDE is convenient for analyzing assembler code just for optimizing and calculating count CPU cycles per function. |
So my last test was errored.
I can't test the patch as it will not run when built with cmake and trying to use old gcc 12 it fails the build (too old) |
I also had this assert triggered as an error, so I fixed it, apparently I need to remove check > 0 altogether. I have found more places for similar optimization, now I will also try to build it through terminal. |
@USBhost, CI tests can give errors in Windows and macOS, I did not build them. |
51f0426
to
93ab560
Compare
- llama_model_params 72 bytes -> 64 bytes - ggml_cgraph 80 bytes -> 72 bytes - hash_node 32 bytes -> 24 bytes - ggml_threadpool 160 -> 152 bytes - best_tokenization 24 -> 16 bytes
@USBhost, Nice 44/43 CI successful checks, this PR is ready for code review and testing. Last unittest llama2 conversation error:
|
clang version 19.1.7 (3) for x86_64-pc-linux-gnuMasterHost: Debian Sid Testing
PRHost: Debian Sid Testing
|
72 threads is overhead Linux, Master
PR
|
GCC (Debian 14.2.0-19) 14.2.0 for x86_64-linux-gnuMaster
PR
|
Still need to test MSVC compiler on Windows, I won't be able to fully test llama.cpp on virtual machine. |
Well it does not seem to make inference speed any faster. Master./build/bin/llama-cli -m /mnt/36TB/AI/Fallen-Llama-3.3-R1-70B-v1/DeepSeek-R1-Distill-Llama-70B-v1-F16.gguf -p "I believe the meaning of life is" -n 1024 -no-cnv ./build/bin/llama-cli -m /mnt/36TB/AI/Fallen-Llama-3.3-R1-70B-v1/DeepSeek-R1-Distill-Llama-70B-v1-Q8_0.gguf -p "I believe the meaning of life is" -n 1024 -no-cnv PR./build/bin/llama-cli -m /mnt/36TB/AI/Fallen-Llama-3.3-R1-70B-v1/DeepSeek-R1-Distill-Llama-70B-v1-F16.gguf -p "I believe the meaning of life is" -n 1024 -no-cnv ./build/bin/llama-cli -m /mnt/36TB/AI/Fallen-Llama-3.3-R1-70B-v1/DeepSeek-R1-Distill-Llama-70B-v1-Q8_0.gguf -p "I believe the meaning of life is" -n 1024 -no-cnv |
@USBhost, Is it under Windows with MSVC? In other words, did PR changes affect only on Clang build? |
Proxmox GCC 12. Same computer as on your older PR. |
I didn't mean master branch. |
gcc-13 (Debian 13.3.0-13) 13.3.0 for x86_64-linux-gnuMaster
PR
|
gcc-12 (Debian 12.4.0-5) 12.4.0 for x86_64-linux-gnuMaster
PR
|
clang version 18.1.8 (17) for x86_64-pc-linux-gnuMaster
PR
|
you should use llama-bench instead. Also tbf i expected the compiler to do the optimization job itself with O3 |
Im tested on Release build, -fshort-enum not safe flag in O3. it is better to manually enum type strictly |
Resolved merge conflict with master branch. |
@ggerganov, @USBhost Hi again.
I'm continuing to optimize count tokens again, reducing and aligning structures for modern x64. I think you will really like this change. Maybe someone can lay out benchmarks faster than me, I don't have such a powerful computer to test it quickly.
PR changes without requirement setting C23 standard. Compability with all older compilers and systems.
Previous similar to this is my PR: #7267