b3631 #309

Nexesenex · 2024-08-27T02:30:54Z

No description provided.

This change fixes a bug where replacing text in a very long string could cause llama.cpp to hang indefinitely. This is because the algorithm used was quadratic, due to memmove() when s.replace() is called in a loop. It seems most search results and LLM responses actually provide the O(n**2) algorithm, which is a great tragedy. Using a builder string fixes things

ggml-ci

* ggml : add ggml_ssm_conv metal impl * ggml : add ssm_scan metal impl ggml-ci

* metal : separate scale and mask from QKT in FA kernel * metal : ne01 check no longer necessary * metal : keep data in local memory

JohannesGaessler and others added 11 commits August 25, 2024 22:11

CUDA: fix Gemma 2 numerical issues for FA (#9166)

f91fc56

common: fixed not working find argument --n-gpu-layers-draft (#9175)

93bc383

ggml-ci : try to improve build time (#9160)

f12ceac

metal : gemma2 flash attention support (#9159)

0c41e03

server : update deps (#9183)

e5edb21

ci : add VULKAN support to ggml-ci (#9055)

7a3df79

tests : fix compile warnings for unreachable code (#9185)

879275a

ggml-ci

ggml : add SSM Metal kernels (#8546)

fc18425

* ggml : add ggml_ssm_conv metal impl * ggml : add ssm_scan metal impl ggml-ci

metal : separate scale and mask from QKT in FA kernel (#9189)

06658ad

* metal : separate scale and mask from QKT in FA kernel * metal : ne01 check no longer necessary * metal : keep data in local memory

ggml : do not crash when quantizing q4_x_x with an imatrix (#9192)

7d787ed

Nexesenex merged commit 007428b into Nexesenex:skystream Aug 27, 2024
15 checks passed

github-actions bot added testing examples server ggml devops labels Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b3631 #309

b3631 #309

Nexesenex commented Aug 27, 2024

b3631 #309

b3631 #309

Conversation

Nexesenex commented Aug 27, 2024