Releases: OuadiElfarouki/llama.cpp
Releases · OuadiElfarouki/llama.cpp
b3295
Inference support for T5 and FLAN-T5 model families (#5763) * llama : add inference support and model types for T5 and FLAN-T5 model families * llama : add new API functions to support encoder-decoder models: llama_encode(), llama_model_has_encoder(), llama_model_decoder_start_token() * common, llama-cli, llama-batched : add support for encoder-decoder models * convert-hf : handle shared token embeddings tensors in T5Model * convert-hf : add support for SentencePiece BPE tokenizer in T5Model (for Pile-T5 models) * convert-hf : add MT5ForConditionalGeneration and UMT5ForConditionalGeneration to architectures supported by T5Model * convert : add t5 tokenizer tests, use "slow" HF tokenizer for t5 --------- Co-authored-by: Stanisław Szymczyk <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
b3276
CUDA: refactor and optimize IQ MMVQ (#8215) * CUDA: refactor and optimize IQ MMVQ * uint -> uint32_t * __dp4a -> ggml_cuda_dp4a * remove MIN_CC_DP4A checks * change default * try CI fix
b3272
nix : enable curl (#8043) Co-authored-by: Georgi Gerganov <[email protected]>
b3268
flake.lock: Update (#8218)
b3259
llama: Add support for Gemma2ForCausalLM (#8156) * Inference support for Gemma 2 model family * Update convert-hf-to-gguf.py, constants, and tensor mappings * cleanup * format fix * Fix special token vocab bug * Don't add space prefix * fix deleted lines * Update src/llama.cpp Co-authored-by: slaren <[email protected]> * Add model type names * Add control vector * Fix model type identification --------- Co-authored-by: Andrei Betlen <[email protected]> Co-authored-by: slaren <[email protected]>
b3197
vulkan: detect multiple devices by deviceUUID instead of deviceID (#8…
b3192
requirements : Bump torch and numpy for python3.12 (#8041)
b3162
cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231) * cuda : fix bounds check for src0 rows in MMVQ kernel * Update ggml-cuda/mmvq.cu Co-authored-by: Johannes Gäßler <[email protected]> --------- Co-authored-by: Johannes Gäßler <[email protected]>
b3149
metal : utilize max shared memory for mul_mat_id (#7935)
b3141
CUDA: fix broken oob check for FA vec f32 kernel (#7904)