b2266 #91

Nexesenex · 2024-02-25T22:19:28Z

No description provided.

* Fix issues during StableLM models conversion * Fix hard coded layer_norm_eps * Support layer_norm_eps for LlavaStableLM Co-authored-by: Jared Van Bortel <[email protected]> * Add missing parenthesis Co-authored-by: Jared Van Bortel <[email protected]> * Support rotary_factor for LlavaStableLM Co-authored-by: Jared Van Bortel <[email protected]> * fix typo * Add StableLMEpochForCausalLM for safety Co-authored-by: compilade <[email protected]> * Add StableLMEpochForCausalLM for safety 2 Co-authored-by: compilade <[email protected]> --------- Co-authored-by: Jared Van Bortel <[email protected]> Co-authored-by: Jared Van Bortel <[email protected]> Co-authored-by: compilade <[email protected]>

* coda : normalize enum names ggml-ci * code : cont * code : cont

…ible endpoint (#5708) * server: monitoring - add /metrics prometheus compatible endpoint * server: concurrency issue, when 2 task are waiting for results, only one call thread is notified * server: metrics - move to a dedicated struct

* server: logs - always use JSON logger, add add thread_id in message, log task_id and slot_id * server : skip GH copilot requests from logging * server : change message format of server_log() * server : no need to repeat log in comment * server : log style consistency * server : fix compile warning * server : fix tests regex patterns on M2 Ultra * server: logs: PR feedback on log level * server: logs: allow to choose log format in json or plain text * server: tests: output server logs in text * server: logs switch init logs to server logs macro * server: logs ensure value json value does not raised error * server: logs reduce level VERBOSE to VERB to max 4 chars * server: logs lower case as other log messages * server: logs avoid static in general Co-authored-by: Georgi Gerganov <[email protected]> * server: logs PR feedback: change text log format to: LEVEL [function_name] message | additional=data --------- Co-authored-by: Georgi Gerganov <[email protected]>

fix nvcc version is empty

* [ggml-quants] Provide ggml_vqtbl1q_u8 for 64bit compatibility vqtbl1q_u8 is not part of arm v7 neon library * [android-example] Remove abi filter after arm v7a fix * [github-workflows] Do not skip Android armeabi-v7a build

The system prompt is now decoded in batches. * server : fix off-by-one n_past when start of prompt matches whole cache The tokens right after the matching part would otherwise skip a pos value.

* llama : refactor k-shift implementation ggml-ci * llama : rename llama_kv_cache_seq_shift to llama_kv_cache_seq_add * llama : cont k-shift refactoring + normalize type names ggml-ci * minor : fix MPI builds * llama : reuse n_rot from the build context ggml-ci * llama : revert enum name changes from this PR ggml-ci * llama : update llama_rope_type * llama : add comment about rope values * llama : fix build * passkey : apply kv cache updates explicitly ggml-ci * llama : change name to llama_kv_cache_update() * llama : add llama_kv_cache_seq_pos_max() * passkey : fix llama_kv_cache_seq_pos_max() usage * llama : some llama_kv_cell simplifications * llama : add llama_kv_cache_compress (EXPERIMENTAL) * llama : add alternative KV cache merging (EXPERIMENTAL) * llama : add llama_kv_cache_defrag * llama : comments * llama : remove llama_kv_cache_compress will add in a separate PR ggml-ci * llama : defragment via non-overlapping moves * llama : ggml_graph based defrag implementation ggml-ci * llama : switch the loop order in build_defrag * llama : add comments

…5718) * server: docs - refresh and tease a little bit more the http server * Rephrase README.md server doc Co-authored-by: Georgi Gerganov <[email protected]> * Update examples/server/README.md Co-authored-by: Georgi Gerganov <[email protected]> * Update examples/server/README.md Co-authored-by: Georgi Gerganov <[email protected]> * Update README.md --------- Co-authored-by: Georgi Gerganov <[email protected]>

* server: tests - longer inference timeout for CI

To complement the token_embd.weight and output.weight : attn_v.weight attn_k.weight. attn_q_weight attn_output.weight attn_qkv.weight ffn_gate ffn_down ffn_up

aahouzi and others added 12 commits February 25, 2024 11:54

code : normalize enum names (#5697)

ab336a9

* coda : normalize enum names ggml-ci * code : cont * code : cont

cmake : fix compilation for Android armeabi-v7a (#5702)

1289408

readme : add Msty to UI list (#5618)

7d548a1

make : fix nvcc version is empty (#5713)

f1a98c5

fix nvcc version is empty

server : fix crash when system prompt is bigger than batch size (#5714)

f762501

The system prompt is now decoded in batches. * server : fix off-by-one n_past when start of prompt matches whole cache The tokens right after the matching part would otherwise skip a pos value.

server: tests - slow inference causes timeout on the CI (#5715)

e3965cf

* server: tests - longer inference timeout for CI

Nexesenex merged commit 65192c0 into Nexesenex:_master_up Feb 25, 2024
27 of 43 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b2266 #91

b2266 #91

Nexesenex commented Feb 25, 2024

b2266 #91

b2266 #91

Conversation

Nexesenex commented Feb 25, 2024