sync : llama.cpp #1134

ggerganov · 2025-03-07T12:51:04Z

No description provided.

-- it might happen if ggml is loaded from 2 separate libraries since each one of them will expose the class. This is more of a guard since we want to use only Metal as embedded library and don't care about the other case.

…201)

…ma/12154) * ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions * cmake: Add GGML_BMI2 build option * ggml: enable BMI2 on relevant CPU variants * ggml-cpu: include BMI2 in backend score * ggml-cpu: register BMI2 in ggml_backend_cpu_get_features * ggml-cpu: add __BMI2__ define when using MSVC

Co-authored-by: ubuntu <[email protected]>

…/12174) ... which left garbage bits in the upper half of the kernel args. This caused segmentation faults when running PoCL.

Fix the following error: ``` ggml-alloc.c:99: not enough space in the buffer ggml_tallocr_alloc: not enough space in the buffer to allocate blk.17.ffn_down.weight (needed 27525120, available 27521024) ``` which occurs when `ggml_backend_opencl_context::alignment` is larger than `cl_ptr_base` (hard-coded to `0x1000`). Also, fix `ggml_backend_opencl_context::alignment` was set to `CL_DEVICE_MEM_BASE_ADDR_ALIGN` which was treated as bytes but the value is reported in bits.

…replaceing it. (llama/12209) This avoids conflict with internal cuda/hip runtimes memory managment behavior.

…12092) (llama/12094) Signed-off-by: Ray Lee <[email protected]> Co-authored-by: Ray Lee <[email protected]>

… (llama/12217) * opencl: support noncontiguous `norm` * opencl: support noncontiguous `rms_norm` * opencl: disable fp16 for `ADD`, `MUL`, `SCALE`, `RELU`, `GELU`, `SILU`, `CLAMP`

This commit updates the custom command to build the default.metallib file to use the correct path to ../ggml-common.h by using the variable METALLIB_COMMON. The motivation for this change is that currently when building and specifying GGML_METAL_EMBED_LIBRARY=OFF the following error is generated: ```console [ 11%] Linking CXX shared library ../../bin/libggml.dylib [ 11%] Built target ggml make[2]: *** No rule to make target `ggml/src/ggml-metal/ggml-common.h', needed by `bin/default.metallib'. Stop. make[1]: *** [ggml/src/ggml-metal/CMakeFiles/ggml-metal-lib.dir/all] Error 2 ``` With the above change the build could progress but there was a follow on error about not being able to find the ggml-common.h file in ggml-metal.metal where is was included as a relative path: ```console [ 11%] Compiling Metal kernels /Users/danbev/work/llama.cpp/build/bin/ggml-metal.metal:6:10: error: '../ggml-common.h' file not found, did you mean 'ggml-common.h'? ^~~~~~~~~~~~~~~~~~ "ggml-common.h" 1 error generated. ``` Removing the relative path then allowed the build to complete successfully.

* metal : refactor im2col parameters into a struct * metal: Change im2col offset types from int32_t to uint64_t to support larger memory offsets * metal : refactor sum_rows parameters into a struct * metal : refactor soft_max parameters into a struct * metal : refactor diag_mask_inf parameters into a struct * metal : refactor ssm_conv parameters into a struct * metal : refactor ssm_scan parameters into a struct * metal : refactor get_rows parameters into a struct * metal : refactor group_norm parameters into a struct * metal : refactor conv_transpose_1d parameters into a struct * metal : refactor upscale parameters into a struct * metal : refactor pad parameters into a struct * metal : refactor pad_reflect_1d parameters into a struct * metal : refactor arange parameters into a struct * metal : refactor timestep_embedding parameters into a struct * metal : refactor argsort parameters into a struct * metal : refactor leaky_relu parameters into a struct * metal : refactor pool_2d parameters into a struct * metal : fix trailing whitespace --------- Co-authored-by: alexju <[email protected]>

ggml-ci

pminev and others added 14 commits March 7, 2025 14:50

ggml : fix GGMLMetalClass ODR (llama/12200)

a60363b

-- it might happen if ggml is loaded from 2 separate libraries since each one of them will expose the class. This is more of a guard since we want to use only Metal as embedded library and don't care about the other case.

SYCL: Disable f16 Unary OPs as not supported by the kernels (llama/12…

8c382a5

…201)

opencl : fix profile-related errors (llama/12095)

a6b96bc

Co-authored-by: ubuntu <[email protected]>

opencl : fix ulong kernel args were set from int variables (llama…

4fbd566

…/12174) ... which left garbage bits in the upper half of the kernel args. This caused segmentation faults when running PoCL.

HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of …

a5020df

…replaceing it. (llama/12209) This avoids conflict with internal cuda/hip runtimes memory managment behavior.

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (llama/12222)

0f13ede

cmake : fix undefined reference errors for std::filesystem in ggml (#…

1993982

…12092) (llama/12094) Signed-off-by: Ray Lee <[email protected]> Co-authored-by: Ray Lee <[email protected]>

opencl: Noncontiguous norm, rms_norm, disable fp16 for some ops…

381c382

… (llama/12217) * opencl: support noncontiguous `norm` * opencl: support noncontiguous `rms_norm` * opencl: disable fp16 for `ADD`, `MUL`, `SCALE`, `RELU`, `GELU`, `SILU`, `CLAMP`

ggml-cpu: faster AVX2 variant for IQ1_M (llama/12216)

61022c3

sync : llama.cpp

90876ef

ggml-ci

ggerganov merged commit 7b08f4c into master Mar 7, 2025
10 checks passed

ggerganov deleted the sync-llama-25-03-07 branch March 7, 2025 13:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync : llama.cpp #1134

sync : llama.cpp #1134

ggerganov commented Mar 7, 2025

sync : llama.cpp #1134

sync : llama.cpp #1134

Conversation

ggerganov commented Mar 7, 2025