Releases: OuadiElfarouki/llama.cpp
Releases · OuadiElfarouki/llama.cpp
b2586
[SYCL] Disable iqx on windows as WA (#6435) * disable iqx on windows as WA * array instead of global_memory
b2585
flake.lock: Update (#6402) Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/44d0940ea560dee511026a53f0e2e2cde489b4d4' (2024-03-23) → 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089' (2024-03-29) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
b2568
convert : refactor vocab selection logic (#6355)
b2549
embedding : show full embedding for single prompt (#6342) * embedding : show full embedding for single prompt To support the use case of creating an embedding for a given prompt, the entire embedding and not just the first part needed to be printed. Also, show cosine similarity matrix only if there is more than one prompt, as the cosine similarity matrix for a single prompt is always `1.00`. * Update examples/embedding/embedding.cpp --------- Co-authored-by: Georgi Gerganov <[email protected]>
b2456
ci : disable stale issue messages (#6126)
b2454
backend : offload large batches to GPU (#6083) * backend : offload large batches to GPU * fix hip * code cleanup * fix CUDA split buffers * Update ggml-backend-impl.h Co-authored-by: Johannes Gäßler <[email protected]> * cuda : fix memset without set_device * imatrix : remove sched affix from weight names * sched : add a new split if the current one has too many inputs reduce max inputs per split more cleanup * update backends ggml-ci --------- Co-authored-by: Johannes Gäßler <[email protected]>