Skip to content

Releases: OuadiElfarouki/llama.cpp

b3295

04 Jul 15:37
807b0c4
Compare
Choose a tag to compare
Inference support for T5 and FLAN-T5 model families (#5763)

* llama : add inference support and model types for T5 and FLAN-T5 model families

* llama : add new API functions to support encoder-decoder models: llama_encode(), llama_model_has_encoder(), llama_model_decoder_start_token()

* common, llama-cli, llama-batched : add support for encoder-decoder models

* convert-hf : handle shared token embeddings tensors in T5Model

* convert-hf : add support for SentencePiece BPE tokenizer in T5Model (for Pile-T5 models)

* convert-hf : add MT5ForConditionalGeneration and UMT5ForConditionalGeneration to architectures supported by T5Model

* convert : add t5 tokenizer tests, use "slow" HF tokenizer for t5

---------

Co-authored-by: Stanisław Szymczyk <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

b3276

01 Jul 20:17
cb5fad4
Compare
Choose a tag to compare
CUDA: refactor and optimize IQ MMVQ (#8215)

* CUDA: refactor and optimize IQ MMVQ

* uint -> uint32_t

* __dp4a -> ggml_cuda_dp4a

* remove MIN_CC_DP4A checks

* change default

* try CI fix

b3272

01 Jul 13:14
3840b6f
Compare
Choose a tag to compare
nix : enable curl (#8043)

Co-authored-by: Georgi Gerganov <[email protected]>

b3268

01 Jul 11:56
d0a7145
Compare
Choose a tag to compare
flake.lock: Update (#8218)

b3259

28 Jun 07:28
e57dc62
Compare
Choose a tag to compare
llama: Add support for Gemma2ForCausalLM (#8156)

* Inference support for Gemma 2 model family

* Update convert-hf-to-gguf.py, constants, and tensor mappings

* cleanup

* format fix

* Fix special token vocab bug

* Don't add space prefix

* fix deleted lines

* Update src/llama.cpp

Co-authored-by: slaren <[email protected]>

* Add model type names

* Add control vector

* Fix model type identification

---------

Co-authored-by: Andrei Betlen <[email protected]>
Co-authored-by: slaren <[email protected]>

b3197

21 Jun 17:40
557b653
Compare
Choose a tag to compare
vulkan: detect multiple devices by deviceUUID instead of deviceID (#8…

b3192

21 Jun 03:38
b1ef562
Compare
Choose a tag to compare
requirements : Bump torch and numpy for python3.12 (#8041)

b3162

16 Jun 20:01
19b7a83
Compare
Choose a tag to compare
cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231)

* cuda : fix bounds check for src0 rows in MMVQ kernel

* Update ggml-cuda/mmvq.cu

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>

b3149

14 Jun 16:15
66ef1ce
Compare
Choose a tag to compare
metal : utilize max shared memory for mul_mat_id (#7935)

b3141

12 Jun 18:11
9635529
Compare
Choose a tag to compare
CUDA: fix broken oob check for FA vec f32 kernel (#7904)