Skip to content

Releases: OuadiElfarouki/llama.cpp

b2586

03 Apr 11:43
5260486
Compare
Choose a tag to compare
[SYCL] Disable iqx on windows as WA (#6435)

* disable iqx on windows as WA

* array instead of global_memory

b2585

02 Apr 12:34
f87f7b8
Compare
Choose a tag to compare
flake.lock: Update (#6402)

Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/44d0940ea560dee511026a53f0e2e2cde489b4d4' (2024-03-23)
  → 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089' (2024-03-29)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

b2568

28 Mar 16:21
be55134
Compare
Choose a tag to compare
convert : refactor vocab selection logic (#6355)

b2549

27 Mar 13:31
1e13987
Compare
Choose a tag to compare
embedding : show full embedding for single prompt (#6342)

* embedding : show full embedding for single prompt

To support the use case of creating an embedding for a given prompt, the entire embedding and not just the first part needed to be printed.

Also, show cosine similarity matrix only if there is more than one prompt, as the cosine similarity matrix for a single prompt is always `1.00`.

* Update examples/embedding/embedding.cpp

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b2456

18 Mar 12:45
ac9ee6a
Compare
Choose a tag to compare
ci : disable stale issue messages (#6126)

b2454

18 Mar 11:57
2bf8d0f
Compare
Choose a tag to compare
backend : offload large batches to GPU (#6083)

* backend : offload large batches to GPU

* fix hip

* code cleanup

* fix CUDA split buffers

* Update ggml-backend-impl.h

Co-authored-by: Johannes Gäßler <[email protected]>

* cuda : fix memset without set_device

* imatrix : remove sched affix from weight names

* sched : add a new split if the current one has too many inputs
reduce max inputs per split
more cleanup

* update backends

ggml-ci

---------

Co-authored-by: Johannes Gäßler <[email protected]>