Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama: update llama.cpp to latest version #244

Merged
merged 11 commits into from
Dec 15, 2023
Merged

Conversation

danbev
Copy link
Contributor

@danbev danbev commented Nov 27, 2023

This commit updates llama.cpp to the latest/later version.

The motivation for this is that the current version of llama.cpp is a little outdated and there have been changes to the llama.cpp API and also the model format. Currently it is not possible to use the new GGUF format and many of the available models are in this new format which can make it challenging to use this crate at the moment.

The following changes have been made:

  • update llama.cpp to latest version using git submodule update --remote --merge llama.cpp

  • Manually copied the generated bindings.rs file from the target directory to the src directory. Hope this was the correct thing to do.

  • Updated the llm-chain-llama crate to use llama_decode instead of llm_eval which has now been deprecated.

  • A number of TODOs have been added to the code to highlight areas that I know I need to look at.


This is a work in progress but I wanted to open a draft pull request sooner rather than later to get some visibility and feedback.

Currently I've been able to successfully run the simple example, few_shot, and stream examples. The map_reduce_llama example is not working as of this writing which I'll look into further.

@williamhogman
Copy link
Contributor

<3

@Juzov
Copy link
Collaborator

Juzov commented Nov 28, 2023

There's a clause ignoring MaxTokens if MaxTokens == 0 (or rather the reverse). So adding MaxTokens to be equal to MaxContextSize in the examples is redundant. If you want, and its possible, you can try to alter the option to be as big as the context window by default and remove the clause.

@danbev
Copy link
Contributor Author

danbev commented Nov 29, 2023

So adding MaxTokens to be equal to MaxContextSize in the examples is redundant.

I added a MaxBatchSize option in 452ac2c, and it has default value of 512 which matches the value in llama.cpp, and have now removed the options from the examples (apart from simple_llama).

If you want, and its possible, you can try to alter the option to be as big as the context window by default and remove the clause.

I'm planning on taking a closer look at the model options today, and I'll also take another look at the context options and your suggestion. Thanks

This commit updates llama.cpp to the latest/later version.

The motivation for this is that the current version of llama.cpp is
a little outdated and there have been changes to the llama.cpp API and
also the model format. Currently it is not possible to use the new GGUF
format and many of the available models are in this new format which
can make it challenging to use this crate at the moment.

The following changes have been made:
* update llama.cpp to latest version using
  git submodule update --remote --merge llama.cpp

* Manually copied the generated bindings.rs file from the target
  directory to the src directory. Hope this was the correct thing to do.

* Updated the llm-chain-llama crate to use llama_decode instead of
  llm_eval which has now been deprecated.

Signed-off-by: Daniel Bevenius <[email protected]>
This is an attempt to fix builds from returning:

```
Error: The operation was canceled.
```

Signed-off-by: Daniel Bevenius <[email protected]>
This is an attempt to prevent the currently failing windows build from
causing the other builds to be cancelled (at least that is what I think
is happening).

Signed-off-by: Daniel Bevenius <[email protected]>
This commit is an attempt to update the hnsw dependency to version 0.2
and to fix the currently failing windows build of hswn_rs
version 0.1.19 which is current failing to compile on windows:

```console
error[E0308]: mismatched types
   --> C:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\hnsw_rs-0.1.19\src\libext.rs:439:39
    |
439 |     let c_dist = DistCFFI::<f32>::new(c_func);
    |                  -------------------- ^^^^^^ expected `u32`, found `u64`
    |                  |
    |                  arguments to this function are incorrect
    |
    = note: expected fn pointer `extern "C" fn(_, _, u32) -> _`
               found fn pointer `extern "C" fn(_, _, u64) -> _`
note: associated function defined here
   --> C:\Users\runneradmin\.cargo\registry\src\index.crates.io-6f17d22bba15001f\hnsw_rs-0.1.19\src\dist.rs:990:12
    |
990 |     pub fn new(f:DistCFnPtr<T>) -> Self {
    |            ^^^ ---------------
```

I was able to reproduce this issue locally by cross compiling (which
produces the above error). But cross compiling with version 0.2 work and
so I've attempted to upgrade to that version.

This is very much a suggestion as I'm not familiar with the hnsw code
but perhaps it will be useful to someone else and save some time
investigating the issue.

Signed-off-by: Daniel Bevenius <[email protected]>
This commit changes the build.rs script to write the generated
bindings to the src directory to avoid manual copying of the
bindings.rs file.

Signed-off-by: Daniel Bevenius <[email protected]>
Signed-off-by: Daniel Bevenius <[email protected]>
@danbev danbev changed the title llama: update llama.cpp to latest version (wip) llama: update llama.cpp to latest version Dec 5, 2023
@danbev danbev marked this pull request as ready for review December 5, 2023 08:17
Copy link
Collaborator

@Juzov Juzov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, added some comments. After they have been explained, we can approve this. Thanks for your effort

crates/llm-chain-llama/src/options.rs Outdated Show resolved Hide resolved
.github/workflows/cicd.yaml Show resolved Hide resolved
.github/workflows/cicd.yaml Show resolved Hide resolved
crates/llm-chain-llama/src/executor.rs Outdated Show resolved Hide resolved
crates/llm-chain-llama/src/batch.rs Outdated Show resolved Hide resolved
crates/llm-chain-llama/src/executor.rs Outdated Show resolved Hide resolved
This commit revert the change to the StopSequence option in
llm-chain-llama/src/options.rs.

Signed-off-by: Daniel Bevenius <[email protected]>
This commit removes the `From<llama_batch> for LlamaBatch` impl as it
is no longer needed.

Signed-off-by: Daniel Bevenius <[email protected]>
This commit creates a new LlamaBatch for new token sampled instead of
reusing the same one.

Signed-off-by: Daniel Bevenius <[email protected]>
This commit extracts the logic for checking if the prompt is a question
into a separate conditional check. I've tried to clarify the comment of
this check as well so it is hopefully easier to understand now.

Signed-off-by: Daniel Bevenius <[email protected]>
@Juzov Juzov merged commit cad5646 into sobelio:main Dec 15, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants