Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

b2651 #107

Merged
merged 14 commits into from
Apr 11, 2024
Merged

b2651 #107

merged 14 commits into from
Apr 11, 2024

Conversation

Nexesenex
Copy link
Owner

No description provided.

ggerganov and others added 14 commits April 9, 2024 20:29
Key changes:
* BERT conversion: fix abuse of LlamaHfVocab, do not set BOS or EOS
* Nomic Embed conversion: pad vocab instead of slicing embedding tensor
* llama_tokenize: handle added special tokens like HF does
* docs: how to add a model

* docs: model: typo and docs

* docs: model: add prevision on RoPE

* docs: model: rephrasing README.md

* docs: model: rephrasing README.md

* docs: model: README.md fix trailing spaces

* docs : some fixes

* Update README.md

---------

Co-authored-by: Georgi Gerganov <[email protected]>
* minor layout improvements

* added missing file, run deps.sh locally
This commit adds an option to the gguf example to not check the tensor
data.

The motivation for this is that it can be nice to use the gguf tool to
read other .gguf files that were not created by the gguf tool.

Signed-off-by: Daniel Bevenius <[email protected]>
* gguf-debug: Example how to use ggml callback for debugging

* gguf-debug: no mutex, verify type, fix stride.

* llama: cv eval: move cb eval field in common gpt_params

* ggml_debug: use common gpt_params to pass cb eval.
Fix get tensor SIGV random.

* ggml_debug: ci: add tests

* ggml_debug: EOL in CMakeLists.txt

* ggml_debug: Remove unused param n_batch, no batching here

* ggml_debug: fix trailing spaces

* ggml_debug: fix trailing spaces

* common: fix cb_eval and user data not initialized

* ci: build revert label

* ggml_debug: add main test label

* doc: add a model: add a link to ggml-debug

* ggml-debug: add to make toolchain

* ggml-debug: tests add the main label

* ggml-debug: ci add test curl label

* common: allow the warmup to be disabled in llama_init_from_gpt_params

* ci: add curl test

* ggml-debug: better tensor type support

* gitignore : ggml-debug

* ggml-debug: printing also the sum of each tensor

* ggml-debug: remove block size

* eval-callback: renamed from ggml-debug

* eval-callback: fix make toolchain

---------

Co-authored-by: slaren <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
* scripts : add --outdir option to hf.sh

This commit adds an option to the hf.sh script that allows the user to
specify an output directory for the downloaded file.

The motivation for this changes is that examples that use the hf.sh
script to download models from huggingface can now specify the output
directory, perhaps to the `models` directory to keep them in one place
and not clutter the root directory.

Signed-off-by: Daniel Bevenius <[email protected]>

* squash! scripts : add --outdir option to hf.sh

Fix format of the --outdir option in the usage message.

Signed-off-by: Daniel Bevenius <[email protected]>

---------

Signed-off-by: Daniel Bevenius <[email protected]>
When action download-artifact was updated to v4, the default download path changed.
This fix binaries not being uploaded to releases.
…/ reuses) (#6609)

* grammars: reserve rejects & next candidates

* grammars: reuse new_stacks

* grammars: fix missing sig change in llama.h

* grammars: fix test (api changed)

* grammars: update gbnf-validator.cpp

* grammars: simpler syntax (no swap)
@Nexesenex Nexesenex merged commit a8dd6b3 into Nexesenex:sidestream Apr 11, 2024
15 of 26 checks passed
Nexesenex pushed a commit that referenced this pull request Dec 22, 2024
* iq1_bn: faster Metal dot product

82 t/s -> 87.9 t/s

* iq1_bn(Metal): 87.9 -> 89.0 t/s for TG-128

* iq1_bn(Metal): 89.0 -> 94.7 t/s for TG-128

So, total improvement is ~15%. Not bad.

* iq1_bn(Metal): 686 -> 702 t/s for PP-512

* iq2_bn(Metal): 710 -> 714 t/s for PP-512

---------

Co-authored-by: Iwan Kawrakow <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.