Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Did we support gpt4all-lora-quantized.bin? #86

Closed
katopz opened this issue Mar 29, 2023 · 19 comments
Closed

Did we support gpt4all-lora-quantized.bin? #86

katopz opened this issue Mar 29, 2023 · 19 comments
Labels
issue:enhancement New feature or request topic:model-support Support for new models

Comments

@katopz
Copy link
Contributor

katopz commented Mar 29, 2023

I try use gpt4all-lora-quantized.bin from https://github.com/nomic-ai/gpt4all#try-it-yourself

cargo run --release -- -m ./data/gpt4all-lora-quantized.bin -f examples/alpaca_prompt.txt --repl

And got

[2023-03-29T07:21:13Z INFO  llama_cli] Warning: Bad token in vocab at index 131
[2023-03-29T07:21:13Z INFO  llama_cli] Warning: Bad token in vocab at index 132
[2023-03-29T07:21:13Z INFO  llama_cli] Warning: Bad token in vocab at index 133
...
[2023-03-29T07:21:13Z INFO  llama_cli] Warning: Bad token in vocab at index 256
[2023-03-29T07:21:13Z INFO  llama_cli] Warning: Bad token in vocab at index 257
[2023-03-29T07:21:13Z INFO  llama_cli] Warning: Bad token in vocab at index 258
[2023-03-29T07:21:13Z INFO  llama_cli] ggml ctx size = 4017.35 MB
    
[2023-03-29T07:21:13Z INFO  llama_cli] Loading model part 1/1 from './data/gpt4all-lora-quantized.bin'

thread 'main' panicked at 'index out of bounds: the len is 2 but the index is 2', /Users/katopz/git/katopz/llama-rs/llama-rs/src/lib.rs:773:21

Maybe i've to convert it first?

@philpax
Copy link
Collaborator

philpax commented Mar 29, 2023

They've made some changes to the inferencing code, with the main one I can see being the addition of a token. Not sure if there's anything else, would appreciate it if you could look into it!

@KerfuffleV2
Copy link
Contributor

KerfuffleV2 commented Mar 29, 2023

I don't think a difference in vocabulary length would matter as the number of vocabulary items is handled dynamically.

let n_dims = read_i32(&mut part_reader)?;
/* ... */
let mut nelements = 1;
let mut ne = [1i32, 1i32];
for i in 0..n_dims {
    ne[i as usize] = read_i32(&mut part_reader)?; // line 773
    nelements *= ne[i as usize];
}

Looks like it's hard coded to support a tensor 2 (or maybe up to 2) dimensions but got one that was dimensions. Fixing this one part probably wouldn't be hard, but I'm pretty sure it'll just break a little later because the tensors aren't the expected shape.

(Pretty sure gpt4all would just be a completely different model architecture compared to Llama.) edit: This is wrong, the issue is probably just a vocab size mismatch throwing things out of alignment.

@philpax
Copy link
Collaborator

philpax commented Mar 29, 2023

Are you sure about the architecture being different? I think it's just a finetuned version of LLaMA - as far as I can tell, the only architectural difference is they supply the <pad> token manually: antimatter15/alpaca.cpp@master...zanussbaum:gpt4all.cpp:master

@KerfuffleV2
Copy link
Contributor

Are you sure about the architecture being different?

Nope. :) I assumed it was a GPT architecture because of the name, but you know what they say about assuming. When you assume, you make an ass of U and ME. Sorry about that.

It seems like they didn't just change the vocab numbers but how it's calculated though, right?

Could possibly try changing this line:

https://github.com/rustformers/llama-rs/blob/a067431773ba0194d6bc352faf40a770c1eac81c/llama-rs/src/lib.rs#L532

to

 n_vocab: read_i32(&mut reader)? - 1,

I guess what they're actually doing is setting n_vocab to 32,001 in the file, but actually only including 32,000 entries and then manually generating an entry after it is loaded. That's really weird. I think my suggestion would work, unless the model is actually going to try to generate that last token though (obviously it will break loading normal Llama models so it would be useful only for checking to see if that allows loading the gpt4all model).

@philpax
Copy link
Collaborator

philpax commented Mar 29, 2023

Yeah, it's pretty weird (I'd almost say sloppy...)

We could special-case this (assuming that's the fix), or we can patch the model itself to work correctly. I hope they end up doing the latter for us.

@cksac
Copy link

cksac commented Mar 30, 2023

I convert the model using https://github.com/ggerganov/llama.cpp#using-gpt4all, can run using llama.cpp and llama-rs

@katopz
Copy link
Contributor Author

katopz commented Mar 31, 2023

I convert the model using https://github.com/ggerganov/llama.cpp#using-gpt4all, can run using llama.cpp and llama-rs

I can see

python3 convert-gpt4all-to-ggml.py models/gpt4all-7B/gpt4all-lora-quantized.bin ./models/tokenizer.model 

But I can't find ./models/tokenizer.model there?

@KerfuffleV2
Copy link
Contributor

@katopz You'll likely need to get your hands on the Llama tokenizer, it's not that hard to find. However, the Llama license doesn't actually allow free distribution which is why you won't find links to that kind of thing in most project repos, etc.

@katopz
Copy link
Contributor Author

katopz commented Mar 31, 2023

Hm I see. But maybe I got wrong ./models/tokenizer.model from random internet dude.

...
[2023-03-31T13:24:20Z INFO  llama_cli] Warning: Bad token in vocab at index 256
[2023-03-31T13:24:20Z INFO  llama_cli] Warning: Bad token in vocab at index 257
[2023-03-31T13:24:20Z INFO  llama_cli] Warning: Bad token in vocab at index 258
[2023-03-31T13:24:20Z INFO  llama_cli] ggml ctx size = 4017.35 MB
    
[2023-03-31T13:24:20Z INFO  llama_cli] Loading model part 1/2 from './data/gpt4all-lora-quantized.bin'
    
thread 'main' panicked at 'Could not load model: TensorWrongSize { tensor_name: "tok_embeddings.weight", path: "./data/gpt4all-lora-quantized.bin" }', llama-cli/src/main.rs:209:10

Can i get your converted shasum -a 256 gpt4all-lora-quantized.bin please?

@setzer22
Copy link
Collaborator

setzer22 commented Apr 2, 2023

There's two files: The original gpt4all model has the following hash:

05c9dc0a4904f3b232cffe717091b0b0a8246f49c3f253208fbf342ed79a6122  gpt4all-lora-quantized.bin.orig

And after converting using the convert-gpt4all-to-ggml.py, it should be this hash:

d9af98b0350fc8af7211097e816ffbb8bae9a18f8aea8c50ff94a99bd6cb2c7c  gpt4all-lora-quantized.bin

I confirm I also get the TensorWrongSize error with the converted one (the original model doesn't load either).

@Metalflame12
Copy link
Contributor

This is weird but only works if the model name is not "gpt4all-lora-quantized.bin".
Try renaming to "gpt4all-lora-quantized-ggml.bin" or something like that, it will work.

@katopz
Copy link
Contributor Author

katopz commented Apr 2, 2023

[2023-04-02T16:50:24Z INFO  llama_cli] Loaded tensor 280/291
[2023-04-02T16:50:24Z INFO  llama_cli] Loaded tensor 288/291
[2023-04-02T16:50:24Z INFO  llama_cli] Loading of './data/gpt4all-lora-quantized-ggml.bin' complete
[2023-04-02T16:50:24Z INFO  llama_cli] Model size = 4017.27 MB / num tensors = 291
[2023-04-02T16:50:24Z INFO  llama_cli] Model fully loaded!

WHAT 🤯

@KerfuffleV2
Copy link
Contributor

llama-rs has weird behavior where it will try to discover extra parts if there are other files that start with the same prefix. Maybe that's what happened here?

Caused me an issue when I converted some stuff with the llama.cpp converter and it made a model.bin.tmp file.

Personally, I'd like to just turn off automatic model part discovery and only have it enabled if it's explicitly requested with a commandline flag or something.

@Metalflame12
Copy link
Contributor

yup, removing the file "gpt4all-lora-quantized.bin.orig" from the directory will work without having to rename the other one

@philpax
Copy link
Collaborator

philpax commented Apr 2, 2023

Hm, didn't realise that could be a problem. I'll change the extra part behaviour to only check for numerical suffixes.

@philpax philpax added issue:enhancement New feature or request topic:model-support Support for new models labels Apr 20, 2023
@philpax
Copy link
Collaborator

philpax commented Apr 20, 2023

It looks like they've moved on from using LLaMA and are now targeting GPT-J (#144). Not sure what to do about this in the meantime - I guess we can tell people to run the conversion script?

@katopz
Copy link
Contributor Author

katopz commented Apr 20, 2023

That's fine enough, maybe we should add "how to" to the readme in conversion section somehow.

@philpax
Copy link
Collaborator

philpax commented May 9, 2023

@danforbes has mentioned that GPT4All-J works ootb with the GPT-J arch. Should we close this?

@danforbes
Copy link
Contributor

Tested with https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin

@katopz katopz closed this as completed May 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue:enhancement New feature or request topic:model-support Support for new models
Projects
None yet
Development

No branches or pull requests

7 participants