b2845 #115

Nexesenex · 2024-05-11T07:59:31Z

No description provided.

@hanishkvc

…7097) @hanishkvc added a new `--interactive-specials` flag which would allow for inserting special tokens from user side into the embedding stream.

The llama.cpp grammar parser had a bug where forgetting to add a closing quotation mark to strings would cause parsing to crash. Anyone running a server on a public endpoint is advised to upgrade. To reproduce this bug ./llamafile -m foo.gguf -p bar --grammar 'root::="' Credit for discovering and reporting this issue goes to Eclypsium Security Researcher Richard Johnson <[email protected]>.

…#7200)

* metal : fix flash attention kernel requirements ggml-ci * metal : fix ggml_metal_supports_op ggml-ci

* ggml : full ALiBi support * ggml : update ggml_soft_max_ext() CUDA, SYCL * ggml : ggml_flash_attn_ext() support ALiBi (CPU) * ggml : ggml_flash_attn_ext() support ALiBi (Metal) * ggml : fix warning * ggml : ggml_flash_attn_ext() support ALiBi (CUDA) ggml-ci * ggml : fix assert message * vulkan : add dev notes * ggml : require mask when using ALiBi ggml-ci * convert : fix convert for refact models

* feat: first things to do * feat: create tensors for Jina architecture * fix: use other tensors * feat: embedding gets results * fix: fix usage of ALIBI * fix: clean prints * fix: do some cleanup unused vars * fix: revert changes to Makefile and CMakeLists * fix: revert some changes * fix: fix small detail * fix: fix convert formatting * fix: fix linting and editor * feat: set proper vocab settings * fix: JinaBertForMaskedLM registration * feat: support q_normalization and k_normalization in Jina arch * feat: handle gpt2 tokenizer with Jina architecture * feat: example comments in embedding * feat: rename Jina Bert to Jina Bert V2 * fix: add some changes as per review * feat: proper KQ_pos for Jina embeddings * feat: add capacity to load models ES and DE for Spanish * llama : fix pre-tokenizers * ggml : full ALiBi support * ggml : update ggml_soft_max_ext() CUDA, SYCL * ggml : ggml_flash_attn_ext() support ALiBi (CPU) * ggml : ggml_flash_attn_ext() support ALiBi (Metal) * ggml : fix warning * ggml : ggml_flash_attn_ext() support ALiBi (CUDA) ggml-ci * minor : clean-up * embedding : add warning about missing SEP --------- Co-authored-by: Georgi Gerganov <[email protected]>

* MMQ for Q6_0 * Add Q6_0 MMQ to template generator --------- Co-authored-by: Iwan Kawrakow <[email protected]>

hanishkvc and others added 8 commits May 10, 2024 20:21

Main+: optionally allow special tokens from user in interactive mode (#…

f89fe27

…7097) @hanishkvc added a new `--interactive-specials` flag which would allow for inserting special tokens from user side into the embedding stream.

llama : use n_vocab to differentiate between mistral 7B and llama3 8B (…

25c6e82

…#7200)

convert : print "ignore_merges" field

8c66024

metal : fix flash attention kernel requirements (#7169)

18e4376

* metal : fix flash attention kernel requirements ggml-ci * metal : fix ggml_metal_supports_op ggml-ci

llama-bench : add pp+tg test type (#7199)

e849648

Nexesenex merged commit 25442eb into Nexesenex:downstream May 11, 2024
11 of 12 checks passed

Nexesenex pushed a commit that referenced this pull request Dec 22, 2024

MMQ for Q6_0 (#115)

4d2fbde

* MMQ for Q6_0 * Add Q6_0 MMQ to template generator --------- Co-authored-by: Iwan Kawrakow <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b2845 #115

b2845 #115

Nexesenex commented May 11, 2024

b2845 #115

b2845 #115

Conversation

Nexesenex commented May 11, 2024