b3569 #288

Nexesenex · 2024-08-11T14:28:16Z

No description provided.

* llama : better replace_all (cont) ggml-ci * code : deduplicate replace_all ggml-ci

ggml-ci

Co-authored-by: Stanisław Szymczyk <[email protected]>

Signed-off-by: tarilabs <[email protected]>

* gguf-py : add T5ENCODER model architecture * common : call llama_decode() during warmup only if the model has decoder * convert-hf : add T5EncoderModel * llama : add llama_model_has_decoder() API function * llama : split build_t5() into build_t5_encoder() and build_t5_decoder() * llama : add support for LLM_ARCH_T5ENCODER * llama-embedding : add support for LLAMA_POOLING_TYPE_NONE * llama-embedding : add support for encoder-only models --------- Co-authored-by: Stanisław Szymczyk <[email protected]>

* default n_swa for phi-3 * fix * double check swa

…ronization overhead. (#8943) * Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. - Allocation overhead for the temporary std::vectors was easily detectable with a sampling profiler and simple to remove. - ggml_vk_sync_buffer introduce a full pipeline sync which has a significant cost on the GPU side, sometimes larger than the actual kernel execution. Adding only barriers for shader read/writes and transfers seems to be sufficient looking at the code which either launches compute kernels or copies tensors. * Fix small typo --------- Co-authored-by: 0cc4m <[email protected]>

…8956) Co-authored-by: Stanisław Szymczyk <[email protected]>

Co-authored-by: Neo Zhang <>

ggerganov and others added 12 commits August 9, 2024 18:23

llama : better replace_all (cont) (#8926)

45a55b9

* llama : better replace_all (cont) ggml-ci * code : deduplicate replace_all ggml-ci

make : fix llava obj file race (#8946)

272e3bd

ggml-ci

llama : add support for lora adapters in T5 model (#8938)

6afd1a9

Co-authored-by: Stanisław Szymczyk <[email protected]>

Merge commit from fork

b72942f

gguf-py : fix double call to add_architecture() (#8952)

911b437

Signed-off-by: tarilabs <[email protected]>

llama : default n_swa for phi-3 (#8931)

7eb2384

* default n_swa for phi-3 * fix * double check swa

metal : fix uninitialized abort_callback (#8968)

6e02327

llama : check all graph nodes when searching for result_embd_pooled (#…

33309f6

…8956) Co-authored-by: Stanisław Szymczyk <[email protected]>

update guide (#8909)

a21c6fd

Co-authored-by: Neo Zhang <>

flake.lock: Update (#8979)

8cd1bcf

github-actions bot added documentation Improvements or additions to documentation examples python ggml SYCL Vulkan labels Aug 11, 2024

Nexesenex merged commit 6131e6a into Nexesenex:spacestream Aug 11, 2024
46 of 59 checks passed

Nexesenex changed the title ~~b3568~~ b3569 Aug 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b3569 #288

b3569 #288

Nexesenex commented Aug 11, 2024

b3569 #288

b3569 #288

Conversation

Nexesenex commented Aug 11, 2024