Releases · ngxson/llama.cpp

07 Jul 13:55

f7cab35

b3334

gguf-hash: model wide and per tensor hashing using xxhash and sha1 (#…

Assets 20

07 Jul 12:42

github-actions

b3332

b504008

b3332

llama : fix n_rot default (#8348)

ggml-ci

Assets 20

07 Jul 10:59

github-actions

b3328

cb4d86c

b3328

server: Retrieve prompt template in /props (#8337)

* server: Retrieve prompt template in /props

This PR adds the following:
- Expose the model's Jinja2 prompt template from the model in the /props endpoint.
- Change log-level from Error to Warning for warning about template mismatch.

The front-end stands a better chance of actually executing the Jinja template format correctly. Server is currently just guessing it.

Ideally this should have been inside a JSON block that expose the same key/value pairs as listed during startup in "llm_load_print_meta" function.

* Make string buffer dynamic

* Add doc and better string handling

* Using chat_template naming convention

* Use intermediate vector for string assignment

Assets 20

06 Jul 21:57

github-actions

b3327

86e7299

b3327

added support for Authorization Bearer tokens when downloading model …

Assets 20

06 Jul 08:57

github-actions

b3325

87e25a1

b3325

llama : add early return for empty range (#8327)

* llama : add early return for empty range

This commit adds an early return to the llama_kv_cache_seq_add and
llama_kv_cache_seq_div functions.

The motivation for adding this is to avoid looping over the cache
when the range is empty. I ran into this when using the self-extend
feature in main.cpp.

Signed-off-by: Daniel Bevenius <[email protected]>

* llama : add static_cast to fix CI warning/error

This commit attempts to fix the following warning/error:

```console
src/llama.cpp:7271:31: error:
comparison of integer expressions of different signedness:
‘int’ and ‘uint32_t’ {aka ‘unsigned int’} [-Werror=sign-compare]
 7271 |                         if (i < hparams.n_layer_dense_lead) {
      |                             ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
This can be reproduced locally by setting -Wsign-compare in the
Makefile.

Signed-off-by: Daniel Bevenius <[email protected]>

* squash! llama : add early return for empty range

Remove the setting of cache.head to 0 when the range is empty.

Signed-off-by: Daniel Bevenius <[email protected]>

* Update src/llama.cpp

---------

Signed-off-by: Daniel Bevenius <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

Assets 20

05 Jul 17:59

github-actions

b3324

213701b

b3324

Detokenizer fixes (#8039)

* Add llama_detokenize():
  - Update header files location
  - UNKNOWN and CONTROL are 'special pieces'
  - Remove space after UNKNOWN and CONTROL
  - Refactor llama_token_to_piece()
  - Add flag: clean_up_tokenization_spaces
  - Symmetric params for llama_tokenize() and llama_detokenize()

* Update and fix tokenizer tests:
  - Using llama_detokenize()
  - Unexpected vocab type as test fail instead of error
    - Useful when automating tests:
    - If you don't know in advance the vocab type
    - Differenciate other loading errors
  - Skip unicode surrogaes and undefined
  - Gracefully exit threads
    - Using exit() is throwing random exceptions
  - Clean old known problematic codepoints
  - Minor: confusing hexadecimal codepoint

* Update bruteforce random tests
  - Add detokenizer checks
  - New generator: ascii_lr_strip
  - New generator: apostrophe
  - Add more vocabs files
  - Detokenize special tokens.
  - Replace errors with '\uFFFD' when detokenizing to 'utf-8'
  - More edge cases
  - Better detokenization results check

* Fix add_space_prefix, set false by default
* Better leading space removal
* Do not remove space when decoding special tokens
* Bugfix: custom regexs splits undefined unicode codepoints
* 'viking' detokenizer clean spaces

Assets 20

05 Jul 16:05

github-actions

b3322

7ed03b8

b3322

llama : fix compile warning (#8304)

Assets 20

04 Jul 19:51

github-actions

b3306

a38b884

b3306

cli: add EOT when user hit Ctrl+C (#8296)

* main: add need_insert_eot

* do not format system prompt if it is empty

Assets 20

04 Jul 09:51

github-actions

b3292

20fc380

b3292

convert : fix gemma v1 tokenizer convert (#8248)

ggml-ci

Assets 19

01 Jul 20:23

github-actions

b3276

cb5fad4

b3276

CUDA: refactor and optimize IQ MMVQ (#8215)

* CUDA: refactor and optimize IQ MMVQ

* uint -> uint32_t

* __dp4a -> ggml_cuda_dp4a

* remove MIN_CC_DP4A checks

* change default

* try CI fix

Assets 20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ngxson/llama.cpp

b3334

b3332

b3328

b3327

b3325

b3324

b3322

b3306

b3292

b3276