Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

b2266 #91

Merged
merged 12 commits into from
Feb 25, 2024
Merged

b2266 #91

merged 12 commits into from
Feb 25, 2024

Conversation

Nexesenex
Copy link
Owner

No description provided.

aahouzi and others added 12 commits February 25, 2024 11:54
* Fix issues during StableLM models conversion

* Fix hard coded layer_norm_eps

* Support layer_norm_eps for LlavaStableLM

Co-authored-by: Jared Van Bortel <[email protected]>

* Add missing parenthesis

Co-authored-by: Jared Van Bortel <[email protected]>

* Support rotary_factor for LlavaStableLM

Co-authored-by: Jared Van Bortel <[email protected]>

* fix typo

* Add StableLMEpochForCausalLM for safety

Co-authored-by: compilade <[email protected]>

* Add StableLMEpochForCausalLM for safety 2

Co-authored-by: compilade <[email protected]>

---------

Co-authored-by: Jared Van Bortel <[email protected]>
Co-authored-by: Jared Van Bortel <[email protected]>
Co-authored-by: compilade <[email protected]>
* coda : normalize enum names

ggml-ci

* code : cont

* code : cont
…ible endpoint (#5708)

* server: monitoring - add /metrics prometheus compatible endpoint

* server: concurrency issue, when 2 task are waiting for results, only one call thread is notified

* server: metrics - move to a dedicated struct
* server: logs - always use JSON logger, add add thread_id in message, log task_id and slot_id

* server : skip GH copilot requests from logging

* server : change message format of server_log()

* server : no need to repeat log in comment

* server : log style consistency

* server : fix compile warning

* server : fix tests regex patterns on M2 Ultra

* server: logs: PR feedback on log level

* server: logs: allow to choose log format in json or plain text

* server: tests: output server logs in text

* server: logs switch init logs to server logs macro

* server: logs ensure value json value does not raised error

* server: logs reduce level VERBOSE to VERB to max 4 chars

* server: logs lower case as other log messages

* server: logs avoid static in general

Co-authored-by: Georgi Gerganov <[email protected]>

* server: logs PR feedback: change text log format to: LEVEL [function_name] message | additional=data

---------

Co-authored-by: Georgi Gerganov <[email protected]>
fix nvcc version is empty
* [ggml-quants] Provide ggml_vqtbl1q_u8 for 64bit compatibility

vqtbl1q_u8 is not part of arm v7 neon library

* [android-example] Remove abi filter after arm v7a fix

* [github-workflows] Do not skip Android armeabi-v7a build
The system prompt is now decoded in batches.

* server : fix off-by-one n_past when start of prompt matches whole cache

The tokens right after the matching part would otherwise skip a pos value.
* llama : refactor k-shift implementation

ggml-ci

* llama : rename llama_kv_cache_seq_shift to llama_kv_cache_seq_add

* llama : cont k-shift refactoring + normalize type names

ggml-ci

* minor : fix MPI builds

* llama : reuse n_rot from the build context

ggml-ci

* llama : revert enum name changes from this PR

ggml-ci

* llama : update llama_rope_type

* llama : add comment about rope values

* llama : fix build

* passkey : apply kv cache updates explicitly

ggml-ci

* llama : change name to llama_kv_cache_update()

* llama : add llama_kv_cache_seq_pos_max()

* passkey : fix llama_kv_cache_seq_pos_max() usage

* llama : some llama_kv_cell simplifications

* llama : add llama_kv_cache_compress (EXPERIMENTAL)

* llama : add alternative KV cache merging (EXPERIMENTAL)

* llama : add llama_kv_cache_defrag

* llama : comments

* llama : remove llama_kv_cache_compress

will add in a separate PR

ggml-ci

* llama : defragment via non-overlapping moves

* llama : ggml_graph based defrag implementation

ggml-ci

* llama : switch the loop order in build_defrag

* llama : add comments
…5718)

* server: docs - refresh and tease a little bit more the http server

* Rephrase README.md server doc

Co-authored-by: Georgi Gerganov <[email protected]>

* Update examples/server/README.md

Co-authored-by: Georgi Gerganov <[email protected]>

* Update examples/server/README.md

Co-authored-by: Georgi Gerganov <[email protected]>

* Update README.md

---------

Co-authored-by: Georgi Gerganov <[email protected]>
* server: tests - longer inference timeout for CI
@Nexesenex Nexesenex merged commit 65192c0 into Nexesenex:_master_up Feb 25, 2024
27 of 43 checks passed
Nexesenex added a commit that referenced this pull request Dec 22, 2024
To complement the token_embd.weight and output.weight :

attn_v.weight
attn_k.weight.
attn_q_weight
attn_output.weight
attn_qkv.weight
ffn_gate
ffn_down
ffn_up
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants