Releases · ngxson/llama.cpp

17 Jan 13:43

3edfa7d

b4502 Latest

Latest

llama.android: add field formatChat to control whether to parse speci…

Assets 23

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-01-17T13:43:44Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-01-17T13:43:58Z
llama-b4502-bin-macos-arm64.zip

13 MB 2025-01-17T13:44:11Z
llama-b4502-bin-macos-x64.zip

13.9 MB 2025-01-17T13:44:12Z
llama-b4502-bin-ubuntu-x64.zip

15.8 MB 2025-01-17T13:44:13Z
llama-b4502-bin-win-avx-x64.zip

9.84 MB 2025-01-17T13:44:14Z
llama-b4502-bin-win-avx2-x64.zip

9.85 MB 2025-01-17T13:44:15Z
llama-b4502-bin-win-avx512-x64.zip

9.86 MB 2025-01-17T13:44:16Z
llama-b4502-bin-win-cuda-cu11.7-x64.zip

147 MB 2025-01-17T13:44:17Z
llama-b4502-bin-win-cuda-cu12.4-x64.zip

147 MB 2025-01-17T13:44:24Z
Source code (zip)

2025-01-17T12:57:56Z
Source code (tar.gz)

2025-01-17T12:57:56Z

17 Jan 09:41

github-actions

b4501

667d728

b4501

rpc : early register backend devices (#11262)

Early register RPC devices and do not propagate RPC specifics in the
llama model structures.

ref: #10609

Assets 23

17 Jan 08:10

github-actions

b4500

a133566

b4500

vocab : fix double-eos check (#11273)

ggml-ci

Assets 23

17 Jan 07:52

github-actions

b4499

960ec65

b4499

llama : fix deprecation message: vocabable -> vocab (#11269)

Assets 23

16 Jan 22:26

github-actions

b4497

bd38dde

b4497

vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11…

Assets 23

16 Jan 16:38

github-actions

b4493

9c8dcef

b4493

CUDA: backwards pass for misc. ops, add tests (#11257)

* CUDA: backwards pass for misc. ops, add tests

* remove restrict from pointers

Assets 23

16 Jan 10:22

github-actions

b4491

c67cc98

b4491

ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (#11227)

* Add SVE support for q4_K_q8_K

* Update ggml/src/ggml-cpu/ggml-cpu-quants.c

change to use K_SCALE_SIZE

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 23

15 Jan 13:55

github-actions

b4488

1d85043

b4488

fix: ggml: fix vulkan-shaders-gen build (#10448)

* fix: ggml: fix vulkan-shaders-gen build

The vulkan-shaders-gen target was not being built correctly
in case of cross-compilation.
Other outputs need to be built for the cross compile target,
but vulkan-shaders-gen needs to be built for the host.

* refactor: ggml: Improve vulkan-shaders-gen toolchain setup

- Add GGML_SHADERS_GEN_TOOLCHAIN CMake option.
- Auto-detect host toolchain if not set.

* refactor: ggml: Improve vulkan-shaders-gen toolchain setup

Use configure_file to generate host_toolchain.cmake from template

* fix: ggml: Fix compile error

Fix compile error not finding vulkan-shaders-gen

* fix: vulkan-shaders-gen build and path handling

Fix build issues with vulkan-shaders-gen:
- Add target dependency for correct build order
- Use CMAKE_HOST_SYSTEM_NAME for executable suffix
- Fix MSVC output directory in host toolchain
- Normalize path handling for cross-compilation

* fix: improve host compiler detection in vulkan shader build

Improve host compiler detection for vulkan shader generation:
- Add NO_CMAKE_FIND_ROOT_PATH to all compiler searches
- Consolidate compiler detection logic
- Fix Windows-specific MSVC detection
- Ensure correct compiler search in cross-compilation

* refactor: Simplify CMake function for detecting host compiler

Simplified the CMake function to improve the process of detecting the host compiler.

* fix: Remove unnecessary Vulkan library linkage in CMakeLists.txt

Since `vulkan-shader-gen.cpp` only requires the `glslc` executable
and not the Vulkan headers or libraries, CMakeLists.txt needs to
be corrected.
(See: ecc93d0558fc3ecb8a5af69d2ece02fae4710ade)

* refactor: Rename host_toolchain.cmake.in

- Rename host_toolchain.cmake.in to cmake/host-toolchain.cmake.in

* refactor: GGML_VULKAN_SHADERS_GEN_TOOLCHAIN

Rename the macro GGML_SHADERS_GEN_TOOLCHAIN to GGML_VULKAN_SHADERS_GEN_TOOLCHAIN

Assets 23

15 Jan 12:38

github-actions

b4487

432df2d

b4487

RoPE: fix back, CUDA support for back + noncont. (#11240)

* RoPE: fix back, CUDA support for back + noncont.

* fix comments reg. non-cont. RoPE support [no-ci]

Assets 23

15 Jan 04:05

github-actions

b4485

f446c2c

b4485

SYCL: Add gated linear attention kernel (#11175)

* SYCL: Add Gated Linear attention kernel

* glahpp: add a space at the end of file

* gla: Put the barrier inside the main logic loop

Assets 23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ngxson/llama.cpp

b4502

b4501

b4500

b4499

b4497

b4493

b4491

b4488

b4487

b4485