Fix llama minitron #307

Nexesenex · 2024-08-23T09:27:21Z

No description provided.

…nov#9105) * server : support reading arguments from environment variables * add -fa and -dt * readme : specify non-arg env var

* llama : advanced batch splits This includes equal-sequence-length batch splits which are useful to simplify recurrent model operators. * llama : always make recurrent state slots contiguous * ggml : simplify mamba operators * llama : fix integer signedness mixing * llama : logits_all has priority over batch->logits Otherwise, the server embeddings tests failed. This was likely an existing problem but was only detected here because of an additional assertion. * llama : apply suggestions Co-authored-by: Georgi Gerganov <[email protected]> * llama : fix t5 segfault * llama : fix Mamba session save and restore * llama : minor cosmetic changes * llama : rename llama_reorder_outputs to llama_output_reorder Also move it closer to llama_output_reserve. * llama : fix pooled embeddings when using batches with equal_seqs * minor : add struct members for clarity ggml-ci * llama : fix T5 segfault again * llama : fix Mamba pooled embeddings with multiple sequences Until the pooled embeddings are refactored to allow splitting across ubatches for causal embeddings, recurrent models can only process a single sequence per ubatch when calculating pooled embeddings. * llama : add llama_model_is_recurrent to simplify figuring that out This will make it easier to more cleanly support RWKV-v6 and Mamba-2. * llama : fix simple splits when the batch contains embeddings --------- Co-authored-by: Georgi Gerganov <[email protected]>

* add onednn * add sycl_f16 * add dnnl stream * add engine map * use dnnl for intel only * use fp16fp16fp16 * update doc

ngxson and others added 6 commits August 21, 2024 11:04

server : support reading arguments from environment variables (ggerga…

fc54ef0

…nov#9105) * server : support reading arguments from environment variables * add -fa and -dt * readme : specify non-arg env var

[SYCL] Add oneDNN primitive support (ggerganov#9091)

1731d42

* add onednn * add sycl_f16 * add dnnl stream * add engine map * use dnnl for intel only * use fp16fp16fp16 * update doc

[SYCL] Add a space to supress a cmake warning (ggerganov#9133)

11b84eb

fix: llama3.1 rope_freqs not respecting custom head_dim

b77d7f6

fix: use potential head_dim for Exaone

1a88919

Nexesenex merged commit 344cac3 into Nexesenex:lcpp_pr_l31_custom_headim Aug 23, 2024
13 of 18 checks passed

github-actions bot added documentation Improvements or additions to documentation examples python server ggml SYCL build labels Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix llama minitron #307

Fix llama minitron #307

Nexesenex commented Aug 23, 2024

Fix llama minitron #307

Fix llama minitron #307

Conversation

Nexesenex commented Aug 23, 2024