Skip to content

Releases: ngxson/llama.cpp

b4457

10 Jan 02:48
ee7136c
Compare
Choose a tag to compare
llama: add support for QRWKV6 model architecture (#11001)

llama: add support for QRWKV6 model architecture (#11001)

* WIP: Add support for RWKV6Qwen2

Signed-off-by: Molly Sophia <[email protected]>

* RWKV: Some graph simplification

Signed-off-by: Molly Sophia <[email protected]>

* Add support for RWKV6Qwen2 with cpu and cuda GLA

Signed-off-by: Molly Sophia <[email protected]>

* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead

Signed-off-by: Molly Sophia <[email protected]>

* Fix some typos

Signed-off-by: Molly Sophia <[email protected]>

* code format changes

Signed-off-by: Molly Sophia <[email protected]>

* Fix wkv test & add gla test

Signed-off-by: Molly Sophia <[email protected]>

* Fix cuda warning

Signed-off-by: Molly Sophia <[email protected]>

* Update README.md

Signed-off-by: Molly Sophia <[email protected]>

* Update ggml/src/ggml-cuda/gla.cu

Co-authored-by: Georgi Gerganov <[email protected]>

* Fix fused lerp weights loading with RWKV6

Signed-off-by: Molly Sophia <[email protected]>

* better sanity check skipping for QRWKV6 in llama-quant

thanks @compilade

Signed-off-by: Molly Sophia <[email protected]>
Co-authored-by: compilade <[email protected]>

---------

Signed-off-by: Molly Sophia <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: compilade <[email protected]>

b4453

09 Jan 11:03
f8feb4b
Compare
Choose a tag to compare
model: Add support for PhiMoE arch (#11003)

* model: support phimoe

* python linter

* doc: minor

Co-authored-by: ThiloteE <[email protected]>

* doc: minor

Co-authored-by: ThiloteE <[email protected]>

* doc: add phimoe as supported model

ggml-ci

---------

Co-authored-by: ThiloteE <[email protected]>

b4451

09 Jan 09:50
d9feae1
Compare
Choose a tag to compare
llama-chat : add phi 4 template (#11148)

b4450

08 Jan 20:53
8d59d91
Compare
Choose a tag to compare
fix: add missing msg in static_assert (#11143)

Signed-off-by: hydai <[email protected]>

b4448

08 Jan 19:37
1bf839b
Compare
Choose a tag to compare
Enhance user input handling for llama-run (#11138)

The main motivation for this change is it was not handing
ctrl-c/ctrl-d correctly. Modify `read_user_input` to handle EOF,
"/bye" command, and empty input cases. Introduce `get_user_input`
function to manage user input loop and handle different return
cases.

Signed-off-by: Eric Curtin <[email protected]>

b4447

08 Jan 16:00
f7cd133
Compare
Choose a tag to compare
ci : use actions from ggml-org (#11140)

b4446

08 Jan 15:46
4d2b3d8
Compare
Choose a tag to compare
lora : improve compat with `mergekit-extract-lora` (#11131)

* (wip) support mergekit-extracted lora

* support mergekit-extract-lora

* use lora->get_scale

* correct comment

* correct norm name & condition

* add some hints

b4445

08 Jan 15:10
c07d437
Compare
Choose a tag to compare
llama : avoid hardcoded QK_K (#11061)

ggml-ci

b4439

08 Jan 10:10
0d52a69
Compare
Choose a tag to compare
ci : fix cmake option (#11125)

b4438

08 Jan 09:06
02f0430
Compare
Choose a tag to compare
Disable GL_KHR_cooperative_matrix Vulkan extension if not available. …