Releases · OuadiElfarouki/llama.cpp

04 Jul 15:37

807b0c4

b3295 Latest

Latest

Inference support for T5 and FLAN-T5 model families (#5763)

* llama : add inference support and model types for T5 and FLAN-T5 model families

* llama : add new API functions to support encoder-decoder models: llama_encode(), llama_model_has_encoder(), llama_model_decoder_start_token()

* common, llama-cli, llama-batched : add support for encoder-decoder models

* convert-hf : handle shared token embeddings tensors in T5Model

* convert-hf : add support for SentencePiece BPE tokenizer in T5Model (for Pile-T5 models)

* convert-hf : add MT5ForConditionalGeneration and UMT5ForConditionalGeneration to architectures supported by T5Model

* convert : add t5 tokenizer tests, use "slow" HF tokenizer for t5

---------

Co-authored-by: Stanisław Szymczyk <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

Assets 20

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-07-04T15:37:19Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-07-04T15:37:28Z
llama-b3295-bin-macos-arm64.zip

45.1 MB 2024-07-04T15:37:39Z
llama-b3295-bin-macos-x64.zip

47.9 MB 2024-07-04T15:37:41Z
llama-b3295-bin-ubuntu-x64.zip

51.2 MB 2024-07-04T15:37:43Z
llama-b3295-bin-win-avx-x64.zip

7.23 MB 2024-07-04T15:37:45Z
llama-b3295-bin-win-avx2-x64.zip

7.23 MB 2024-07-04T15:37:46Z
llama-b3295-bin-win-avx512-x64.zip

7.23 MB 2024-07-04T15:37:46Z
llama-b3295-bin-win-cuda-cu11.7.1-x64.zip

88.1 MB 2024-07-04T15:37:47Z
llama-b3295-bin-win-cuda-cu12.2.0-x64.zip

87.4 MB 2024-07-04T15:37:50Z
Source code (zip)

2024-07-04T13:46:11Z
Source code (tar.gz)

2024-07-04T13:46:11Z

01 Jul 20:17

github-actions

b3276

cb5fad4

b3276

CUDA: refactor and optimize IQ MMVQ (#8215)

* CUDA: refactor and optimize IQ MMVQ

* uint -> uint32_t

* __dp4a -> ggml_cuda_dp4a

* remove MIN_CC_DP4A checks

* change default

* try CI fix

Assets 20

01 Jul 13:14

github-actions

b3272

3840b6f

b3272

nix : enable curl (#8043)

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 20

01 Jul 11:56

github-actions

b3268

d0a7145

b3268

flake.lock: Update (#8218)

Assets 20

28 Jun 07:28

github-actions

b3259

e57dc62

b3259

llama: Add support for Gemma2ForCausalLM (#8156)

* Inference support for Gemma 2 model family

* Update convert-hf-to-gguf.py, constants, and tensor mappings

* cleanup

* format fix

* Fix special token vocab bug

* Don't add space prefix

* fix deleted lines

* Update src/llama.cpp

Co-authored-by: slaren <[email protected]>

* Add model type names

* Add control vector

* Fix model type identification

---------

Co-authored-by: Andrei Betlen <[email protected]>
Co-authored-by: slaren <[email protected]>

Assets 20

21 Jun 17:40

github-actions

b3197

557b653

b3197

vulkan: detect multiple devices by deviceUUID instead of deviceID (#8…

Assets 20

21 Jun 03:38

github-actions

b3192

b1ef562

b3192

requirements : Bump torch and numpy for python3.12 (#8041)

Assets 20

16 Jun 20:01

github-actions

b3162

19b7a83

b3162

cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231)

* cuda : fix bounds check for src0 rows in MMVQ kernel

* Update ggml-cuda/mmvq.cu

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>

Assets 20

14 Jun 16:15

github-actions

b3149

66ef1ce

b3149

metal : utilize max shared memory for mul_mat_id (#7935)

Assets 20

12 Jun 18:11

github-actions

b3141

9635529

b3141

CUDA: fix broken oob check for FA vec f32 kernel (#7904)

Assets 20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: OuadiElfarouki/llama.cpp

b3295

b3276

b3272

b3268

b3259

b3197

b3192

b3162

b3149

b3141