Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b4457
llama: add support for QRWKV6 model architecture (#11001) llama: add support for QRWKV6 model architecture (#11001) * WIP: Add support for RWKV6Qwen2 Signed-off-by: Molly Sophia <[email protected]> * RWKV: Some graph simplification Signed-off-by: Molly Sophia <[email protected]> * Add support for RWKV6Qwen2 with cpu and cuda GLA Signed-off-by: Molly Sophia <[email protected]> * RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead Signed-off-by: Molly Sophia <[email protected]> * Fix some typos Signed-off-by: Molly Sophia <[email protected]> * code format changes Signed-off-by: Molly Sophia <[email protected]> * Fix wkv test & add gla test Signed-off-by: Molly Sophia <[email protected]> * Fix cuda warning Signed-off-by: Molly Sophia <[email protected]> * Update README.md Signed-off-by: Molly Sophia <[email protected]> * Update ggml/src/ggml-cuda/gla.cu Co-authored-by: Georgi Gerganov <[email protected]> * Fix fused lerp weights loading with RWKV6 Signed-off-by: Molly Sophia <[email protected]> * better sanity check skipping for QRWKV6 in llama-quant thanks @compilade Signed-off-by: Molly Sophia <[email protected]> Co-authored-by: compilade <[email protected]> --------- Signed-off-by: Molly Sophia <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: compilade <[email protected]>
b4453
model: Add support for PhiMoE arch (#11003) * model: support phimoe * python linter * doc: minor Co-authored-by: ThiloteE <[email protected]> * doc: minor Co-authored-by: ThiloteE <[email protected]> * doc: add phimoe as supported model ggml-ci --------- Co-authored-by: ThiloteE <[email protected]>
b4451
llama-chat : add phi 4 template (#11148)
b4450
fix: add missing msg in static_assert (#11143) Signed-off-by: hydai <[email protected]>
b4448
Enhance user input handling for llama-run (#11138) The main motivation for this change is it was not handing ctrl-c/ctrl-d correctly. Modify `read_user_input` to handle EOF, "/bye" command, and empty input cases. Introduce `get_user_input` function to manage user input loop and handle different return cases. Signed-off-by: Eric Curtin <[email protected]>
b4447
ci : use actions from ggml-org (#11140)
b4446
lora : improve compat with `mergekit-extract-lora` (#11131) * (wip) support mergekit-extracted lora * support mergekit-extract-lora * use lora->get_scale * correct comment * correct norm name & condition * add some hints
b4445
llama : avoid hardcoded QK_K (#11061) ggml-ci
b4439
ci : fix cmake option (#11125)
b4438
Disable GL_KHR_cooperative_matrix Vulkan extension if not available. …